Follow us on Twitter! USA India
Home Articles UserTV Press Releases Education Careers SMB Zone IT Resources Forums Blogs
Classifieds
CU Friday Jul 17, 2009 Register Login
Archives
Articles By Date
Articles By Category
 
 
 Archives >> Details
Beowulf Provides Strength in Numbers
Linux clustering means cheap supercomputing
Posted by : By Eric Foster-Johnson

The idea that you can build your own supercomputer using off-the-shelf components and even old PCs may seem far-fetched, but it's happening today with Linux clusters.

A cluster is a group of computers bound together. The computers in the cluster work together, usually to solve mathematical problems or to provide support in case some computers in the cluster fail.

In the different types of clusters, the main distinction is how tightly you couple the computers. Many operating systems support clustering, including VMS, Windows 2000, and Linux, where the interest in clustering has surged in recent months.

Economics is a key factor leading to the popularity of Linux clustering. When scientific labs request bids for new systems, most want as many COTS (commercial off-the-shelf), systems as possible. The rationale is that commercially available and commercially supported systems will be cheaper: In the short run with the initial purchase, and in the long run with the ability to add computers to the cluster. And this rationale has proven true.

Clusters of off-the-shelf PCs can often meet or exceed the performance of terribly expensive commercial supercomputers, at one-third to one-tenth of the price. Linux has proven popular as an operating system, given that it's free. This means that the main costs are for the computer and networking hardware. And, using off-the-shelf components and PCs means that the hardware costs are lower than if you purchased specialized equipment.

Most of these new Linux clusters follow a setup pioneered at the Beowulf project. The Beowulf project provides extensive information on how you can set up your own cluster using off-the-shelf components and free software.

The Beowulf software environment itself is distributed as a set of software patches and add-ons that work with most Linux distributions, although the Red Hat distribution seems to be a common base. You can download this software from the Beowulf download page.

You can also download the source code to all these patches and add-ons, or download the environment as a set of prebuilt binary packages, in RPM (Red Hat Package Manager) format, for Red Hat Linux systems.

Under the Hood

Beowulf is a type of cluster that isn't as tightly coupled as parallel computers (such as the parallel Cray systems) but is more tightly coupled than plain old networks of workstations. Unlike networks of workstations, each computer in a Beowulf cluster is dedicated to the tasks performed by the cluster.

That means each computer in the cluster only performs work assigned by the cluster, instead of providing a general-use computer. Typically, the computers in a Beowulf cluster are stored in a server room and you access the entire cluster through a single machine. In a more loosely coupled network of workstations, each computer on the network can participate in shared tasks as well as perform the tasks done by normal computers, such as word processing.

On top of the Beowulf cluster, you can run parallel-processing software such as Parallel Virtual Machine (PVM) or Message Passing Interface (MPI) technologies. Both PVM and MPI provide ways for parts of a parallel system to communicate with the other parts.

MPI is a standard used on most parallel supercomputers. An implementation of MPI called local-area multicomputer (LAM) is highly popular in the Beowulf community. LAM is available from the University of Notre Dame.

The Beowulf type of cluster works best for problems that can be divided into small pieces for individual computation-in other words, the types of problems that work well for parallel computing. These problems include predicting the weather, rendering images from computer geometry, and molecular analysis.

Traditional software, such as relational databases or Web servers, won't get any speed up from a Beowulf cluster without a lot of modifications to the programs' source code. You need an application that has been written for a parallel computing environment to really make use of Beowulf, and these applications, while highly specialized, are readily available, especially at academic centers.

A number of academic centers use Beowulf clusters, including the University of Minnesota at Duluth, and the Physics department at the University of Wisconsin-Milwaukee.

The UW-M work, for example, is devoted to the detection and study of gravitational waves. Its Beowulf cluster analyzes gravitational data, a computationally intensive task. Interestingly, UW-M decided on using a Beowulf cluster after a benchmark test showed that these clusters are the most cost-effective way to analyze the data.

Interest in commercial usage also has grown, with companies such as Doubleclick running Beowulf clusters to analyze vast quantities of data mined from user behavior.

Part of the surge in interest comes from how well Beowulf clusters seem to perform with very low hardware costs. But interest in Beowulf clusters really exploded in early 1999 when IBM demonstrated a cluster of 17 Netfinity servers (with a total of 36 Pentium II processors) and off-the-shelf versions of Red Hat Linux.

This cluster tied the performance of a parallel Cray T3t-900-AC64 supercomputer on the POVRay benchmark, a ray-tracing benchmark used to test the speed of rendering images. The total cost of the IBM system was about $150,000, compared to $5.5 million for the Cray system. With these results, the economics of Beowulf clusters became clear.

Looking at the POVRay benchmark results, nine of the top 10 systems run Linux. (The aforementioned Cray T3, tied for second place, runs the UNICOS operating system.)

In addition to the POVRay benchmark, Beowulf clusters have performed well on the Linpack Benchmark, one of many benchmarks you can run against computers or clusters of computers. According to the site, which lists the top 500 supercomputers based on their performance on the Linpack Benchmark, a number of supercomputers on the list are Beowulf clusters. Interest has grown so much that there's even a Web site devoted to Beowulf news.

With all this interest, a number of companies are trying to get in on the action. Compaq, which inherited the Alpha processor when it purchased Digital Equipment, has provided a lot of support for Linux clusters, especially clusters of systems using Compaq's Alpha processors.

Furthermore, Compaq produced a special Cluster Management Utility that allows users to more easily manage Beowulf clusters.

In addition to Compaq and IBM, a number of smaller vendors, including Paralogic, offer prebuilt Beowulf Linux systems.

And you can even build your own Beowulf cluster. There's also a quick-start guide at the Xtreme Machines site. All you need are a few Linux-compatible PCs, a fast Ethernet switch, and a good bit of experience with Linux or UNIX system administration. This isn't a plug-and-play type of setup.

But lots of people have set up Beowulf clusters, leading to a number of at-home supercomputing Web pages, including Cris.com. These pages describe how you can take off-the-shelf and even old PCs to create Beowulf clusters for performing complex mathematical computations.

This technology works best, though, with modern PCs, especially PCs sporting two or more processors. That does raise the price for curious home users, but a large Beowulf cluster should cost from one-third to one-tenth the price of a commercial parallel supercomputer. Even so, this really is a technology for organizations, not home hobbyists.

If you have computing needs that match what a Beowulf cluster can provide, including rendering images or mathematical number crunching, then it's well worth your while to look into this technology. If you're not familiar with Linux administration, your best bet is likely purchasing a system from IBM, Compaq, Paralogic, or other vendors of prebuilt Beowulf systems.

For a technical introduction to the Beowulf software and its history, see the Beowulf introductory page. There's also a number of FAQs lists: www.dnaco.net/~kragen/beowulf-faq.txt smile.cpe.ku.ac.th/tools/bwfaq2.htm.

From the latter list, you can find links to a host of applications written for Beowulf clusters, links to compiler vendors for tools to parallelize your programs so they take advantage of the clusters, and a lot more documents on how to get going. Finally, if you have a Linux system and you installed the how-to documents, you should be able to find a Beowulf how-to already online on your system.

Contributing Editor Eric Foster-Johnson erc@pconline.com has written 15 books on Linux, UNIX, programming, and open-source tools.

Sidebar

Reclaim Unused Processing Cycles With SETI@home

Beowulf clusters are not the only way to get work done. The SETI@home project demonstrated another, more loosely coupled form of clustering for Linux, Windows, or any other operating system.

SETI (the Search for Extraterrestrial Intelligence) is an ongoing effort devoted, in this instance, mainly to examining data from radio telescopes for evidence of intelligent signals. Examining these radio telescope signals is a computationally intensive task, but one that can easily be broken down into smaller pieces that can be computed independently.

Furthermore, the SETI projects don't have enough computing power to perform the necessary tasks, nor a budget large enough to buy the needed computing power. So, the SETI@home software attempts to enlist users on home (or office) PCs and have each PC perform small chunks of the overall project.

Cleverly designed as a screen saver, the SETI@home software takes advantage of unused computing cycles on your PC to download data from a central location, perform the calculations, and upload the results.

Thousands of PCs work as part of the loose cluster during different times of the day-that is, whenever the SETI@home screen saver is running. Depending on how users do their work, each PC may effectively join and disengage from the cluster many times a day.

Since it runs only as a screen saver on lots of different PCs all over the world, the SETI@home cluster can only really be considered a loose network of workstations, but it certainly helps get the job done (and increases awareness about the project).

You can download the SETI@home software from its Berkeley-hosted page. The software runs on Linux, most versions of UNIX, Windows, and Mac OS systems.

 
 
Archives by Date
 
 
 
 
 
Copyright © 1994-2009 ComputerUser, Inc., All Rights Reserved All marks are trademarks of ComputerUser Media.
Reproduction in whole or in part in any form or medium without express written permission of ComputerUser, Inc. is prohibited.
About us | Terms of use | Privacy Policy | Legal | Trademark/Copyright | Awards | Advertise | Writer guidelines | Sitemap Html Xml | Contact | FAQ's | Feedback  | Link to us