Chokchai Leangsuksun

Subscribe to Chokchai Leangsuksun: eMailAlertsEmail Alerts
Get Chokchai Leangsuksun: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Toward Highly Available Linux Clusters

Synergy between HA and HPC leads to advancements in industry, academic, and research ventures

This article describes the HA-OSCAR architecture and features, and demonstrates how to set up a highly available Linux cluster using the first beta release of HA-OSCAR version 1.0.

In 2002, Ibrahim Haddad, Chokchai Leangsuksun, and Stephen L. Scott established the HA-OSCAR (High Availability OSCAR) project with a primary goal of leveraging the existing OSCAR (Open Source Cluster Application Resources) technology while providing high-availability and scalability capabilities for Linux clusters. The OCG (Open Cluster Group) recognized the project as an official working group, along with the current OSCAR and Thin-OSCAR (diskless cluster) working groups.

The anticipated users of the HA-OSCAR technology are members of the telecommunications industry and other industries looking to deploy highly available Linux-based clusters such as ISPs, ASPs, and HPC sites.

HA-OSCAR introduces several enhancements and new features to OSCAR, mainly in the areas of availability, scalability, and security. Most of the new features may be mapped to ITU, TNM, and FCAPS. These are concepts that the telecom industry has widely adopted to manage network elements.

The HA-OSCAR Project
In the last few decades, HA computing has played a critical role in industry mission-critical applications. On the other hand, HPC has been an equally significant enabler to the R&D community for their scientific discoveries through computation-intensive simulations. The development of a beneficial synergy between HA and HPC will clearly lead to even more advances for industry, academic, and research entities.

HA-OSCAR is an open source project that aims to provide the combined power of a high-availability and high-performance computing solution. Our goal is to enhance a Beowulf cluster system for mission-critical applications. To achieve high availability, component redundancy is adopted in the HA-OSCAR cluster to eliminate single-point-of-failure problems. HA-OSCAR also incorporates a self-healing mechanism, failure detection and recovery, automatic failover, and fail-back.

The primary goal of the HA-OSCAR group is to leverage existing OSCAR technology while providing for new high-availability capabilities in OSCAR-built clusters.

The founding and main organizational contributors to the HA-OSCAR project are:

  • The eXtreme Computing Research group from Louisiana Tech University
  • Oak Ridge National Laboratory, Network and Cluster Computing group of the Computer Science and Mathematics Division
  • The Open System Lab from Ericsson Corporate Unit of Research
HA-OSCAR Architecture
Figure 1 shows the architecture of HA-OSCAR.


The various components follow:

  • A primary server: Responsible for receiving and distributing the requests to specified clients. Each server has three network interface cards: one is connected to the Internet by a public network address and the other two are connected to a private LAN, which consists of a primary Ethernet LAN and a standby LAN. Each LAN consists of network interface cards and a network switch. Each provides communication between servers and clients, and between the primary server and the standby server. At regular intervals, the server polls all the LAN components specified in the cluster configuration file, including the primary LAN cards, the standby LAN cards, and the switches, to collect cluster information.
  • A standby primary server: Activates its services and anticipates taking over the primary server when a failure in the primary server is detected.
  • Head nodes configuration: The current version supports Active/Hot standby for the head nodes. We plan to support Active/Active multihead architecture in a future release. The Active/Active architecture will better utilize resources since both head nodes can be simultaneously active to provide services. The dual master nodes run redundant DHCP, NTP, TFTP, NFS, and SNMP servers. In the event of a head node outage, all functionalities provided by that node will fail over to the second redundant head node and will be served at a reduced performance rate (i.e., in theory, 50% at peak or high-traffic hours).
  • Two redundant LAN connections: The configurable periodical transmission of heartbeat messages travels across the Ethernet LAN between the two servers and monitors the health of the primary server. A default heartbeat period is every five seconds. A high-availability network, via redundant Ethernet ports, is available on every machine, in addition to duplicate switching fabrics (network switches, cables, etc.) for the entire network configuration. This enables every node in the cluster to be present on two or more data paths within its networks. Augmented by this Ethernet redundancy, the cluster will achieve higher network availability. When both networks are available, the cluster can achieve improved communication performance by using techniques such as channel bonding of messages across the redundant communication paths.
  • Multiple clients: Compute nodes dedicated to computation.
  • Adaptive recovery agent: Critical resources and services such as the MAUI scheduler, PBS daemon, and file systems are monitored and handled by a recovery agent in an outage situation.
Failure and Recovery Scenarios
Primary Server Failure

When a primary server failure occurs, the heartbeat detection on the standby server does not receive any response messages from the primary server. After a prescribed time, the standby server takes over the alias IP address of the primary server and control of the cluster transfers from the primary server to the standby server. When the failover completes, users' requests are processed on the standby server until the primary server repair and failback is completed. The operation is transparent to users, and the transfer is virtually seamless except for the brief prescribed time. The failed primary server can be repaired after the standby server takes over the control. Once the repair is completed, the primary server activates the services, regains the alias IP address, and begins to process users' requests. The standby server releases its alias IP address and returns to its initial standby state.

Network Connection Failure
Network connection failures are detected in the following manner:

  1. The standby LAN interface is assigned to be the poller.
  2. The polling interface sends packet messages to all other interfaces on the LAN and receives packets back from all other interfaces on the LAN.
If an interface cannot receive or send a message, the numerical count of packets sent and received on an interface does not increment for a set amount of time. At this point, the interface is considered to be down.

Service Outages
Several critical services such as PBS, MAUI, NFS, and DHCP are monitored by the HA-OSCAR service-monitoring daemon. Users can configure HA-OSCAR to monitor additional services (see "HA-OSCAR Health Monitoring and Configuration"). Once a monitored service failure occurs, HA-OSCAR will trigger a corresponding service recovery procedure. The recovery handler then decides which of the appropriate actions will be taken based on its previous outage state (stored in a history log) and current event type. In a situation where the number of "unsuccessful retries" reaches a threshold level, another more fruitful action may be necessary, such as failover or notifying an operator. This may be because there are system resource conflicts or a delay to a service response. In such cases, HA-OSCAR can trigger a system shutdown procedure for a server failover. In addition, any failover may trigger e-mail notification and the paging of an operator.

HA-OSCAR First Beta Release
The first HA-OSCAR Beta release aims for unplanned downtime reduction in mission-critical HPC and scalable applications. It supports a Linux-based HA cluster computing infrastructure based on OSCAR, the toolkit developed to simplify the installation, configuration, and management of Beowulf clusters. HA-OSCAR enhancement eliminates multiple single-point-of-server failure by component redundancy, outage detection, and recovery mechanism. A standby server is introduced as a server duplicate to take over control of the cluster in case of a server failure occurrence. As a redundancy of the primary server, the standby server is required to have identical hardware and software system configuration. SystemImager is the tool used to handle the server image duplication process.

Preparatory Steps
This section enumerates all the important steps that users should follow to install HA-OSCAR successfully.

Platform Preparation
The cluster should have multiple client nodes and two server nodes. All nodes should be equipped with Ethernet adapters, and there must be at least two Ethernet cards on the servers. In addition, the servers' hardware must be identical. Currently, we separate HA-OSCAR installation from a typical cluster-building tool. We envision that users should be able to either retrofit their normal cluster with our HA enhancement or build an HA-OSCAR cluster from scratch. Since several excellent cluster build systems exist, including OSCAR, ROCKS, and Scyld, and may be used to build a Beowulf-class cluster, we assume that the user should use one of these tools to prepare their cluster platform before installing HA-OSCAR. For a standard Beowulf cluster installation, please refer to OSCAR or a similar tools manual.

HA-OSCAR Installation
Based on the objectives of ease of cluster installation and management, we have developed an easy-to-install package with a GUI interface. Starting with a system that has the standard OSCAR cluster software stack installed, a user may request to download the HA-OSCAR package from As root, the user needs to first untar the package and then start the HA-OSCAR installation wizard with the following command:

% ./haoscar <interface>

Note that the interface directive is the private network interface for the primary head. The HA-OSCAR installation wizard will walk the user through a complete installation process. Information regarding the standard OSCAR installation may be found in the documentation on the OSCAR Web site,

The following sections describe the HA-OSCAR installation process, including the associated screen display.

HA-OSCAR Installation Wizard
Step 1, "HA-OSCAR Packages Installation...", of the wizard-based process will install all the required packages to the OSCAR cluster server (head node) and prepare the environment for the remaining three installation steps.

Fetching (Cloning) Image for Standby Server
Before pressing the "Building Image for Standby server" button (step 2, shown in Figure 2), ensure that your system meets the following criteria:

  • SSH daemon's configuration file (/etc/ ssh/sshd config) on the server has PermitRootLogin set to "yes". After the HA-OSCAR installation, you may set this back to "no", but it must be set to "yes" during the install as the config file is copied to the Standby server, and the root must be able to log in to the Standby server remotely.
  • /etc/hosts.allow and /etc/hosts.deny files allow all traffic from the entire private subnet.
  • Firewall policy must not restrict traffic in the private subnet.
The installation may fail without the above preconditions.

Now you can select the button "Building Image for Standby server" (step 2, shown in Figure 2), and the wizard pops up another window (see Figure 3) where you can specify a server image name. This is an important step in cloning a standby server image from the primary server. For a more stringent downtime requirement, we recommend a separate image server, not one of the cluster head nodes, to serve as an independent image repository for the disaster recovery process.


Normally you can simply retain the default values here and just press the "Fetch Image" button (shown in Figure 3) to get an image for the Standby server. The completion of this step will take several minutes. However, if the default values do not meet your specific needs, you may change such values as client name, SSH User Name, IP Assignment Method, and Post Install Action. After a few minutes, a popup window will indicate the status of the image fetch (success or failed). The installation log file ./root/ha-oscar-1.0/ha-oscarinstall.log will contain information regarding failed fetch. Normal causes of a failure include not meeting the prerequisites or running out of disk space.

Configuring the Standby Server
Pressing the "Configuration for Standby server on" button will pop up a dialog box like the one shown in Figure 4. This step is similar to that used for defining clients in the standard OSCAR installation wizard. However, because the meaning of some of the buttons differs with respect to Standby servers versus compute nodes, we discuss each of these here.

  • Image Name: This is where you specify the image fetched in the previous step. (Do not select the image name for an OSCAR client node.)
  • Domain Name: This is for the server node's domain (if it has one); if the server does not have a domain name, the name oscardomain is entered by default.
  • Base Name: Specifies the first part of the Standby server's name. It will ultimately have an index automatically appended to the end of it.
  • Number of Hosts: Specifies how many Standby servers to create. Default value for the Number of Hosts is one (1), which means only one Standby server is created. You should not change this value unless you desire to build more than one Standby server. Note that only one Primary/Standby server pair is active at any time in the current implementation.
  • Starting Number: The index to append to the Base Name when deriving the Standby server name.
  • Padding: The number of digits to pad the Standby server, e.g., three digits would yield oscarserver001. The default is 0 to have no padding between base name and number (index). Note: Padding with your max number of digits will make sequencing hostnames easier.
  • Starting IP: Specifies the IP address of the Standby server. Make sure that this value does not conflict with the IP addresses of any other nodes on your network.
  • Subnet Mask: The IP netmask for all clients. A class C netmask of should be sufficient for an HA-OSCAR cluster.
  • Default Gateway: Specifies the default route for Standby server.
Press the "Addclients" button when finished entering information. A popup window will indicate the completion status of the Addclient process.

Standby Server MAC Address Collection
Because HA-OSCAR installation uses some SIS modules that are also used by the standard OSCAR installation, these modules share the same database information. In this step, pay attention to assigning the Standby server's MAC address to the corresponding IP address defined in the previous section. That MAC address uniquely identifies the Standby server on your network, and DHCP will use that MAC address to assign an IP address to the Standby server. The following section describes the buttons on the address collection screen shown in Figure 5. The logical progression here is to collect - assign - repeat until done - close.

  • Build Autoinstall Floppy: May be used to build a boot floppy for a Standby server that does not support PXE (network) boot.
  • Setup Network Boot: Will configure the Standby to answer a PXE boot request if its hardware supports that option.
  • Collect MAC Address: Will initiate the collection of the MAC addresses. Each machine must be booted via either network or floppy for this process to occur.
  • Stop Collecting MACs: Terminates the MAC collection process.
  • Assign MAC to Node: First click on a MAC address, highlighting it on the left side of the screen, then click on the assign button to assign that MAC address to the next available node in the client list on the right side of the screen.
  • Dynamic DHCS Update: If selected (the default), the DHCP server configuration will be refreshed each time a MAC is assigned.
  • Configure DHCP Server: This option must be selected to make the association between the MAC and IP address if the Dynamic DHCS Update is off.
  • Close: Selected once all addresses have been selected for the Standby servers.
Completing the HA-OSCAR Installation
At this point, the Standby server software has been installed but is not yet configured. The Standby server should be rebooted from its own hard drive in preparation for the configuration process. The HA-OSCAR configuration process is described in the next section.

HA-OSCAR Health Monitoring and Configuration
HA-OSCAR supports a system resource and outage monitoring and recovery mechanism. It provides a Web-based service monitoring and configuration program, which can be used to configure and monitor services and resource on HA-OSCAR. It also provides a Web-based service monitoring and configuration program, which can be used to configure and manage services and resources on HA-OSCAR. Figure 6 shows the panel where you can add new services and change monitoring parameters.


As shown in Figure 7, you can easily add alias IPs for the two servers that are charged with monitoring each other's health. Furthermore, this module will automatically generate all necessary alerts and monitoring files for HA-OSCAR, thereby alleviating some user errors. As we have shown thus far, our solution is easy to install and configure as much of the physical process is handled behind the scenes via scripts.


HA-OSCAR Roadmap and Plans
The current HA-OSCAR supports the active/hot-standby dual-head architecture. The additional head results in a considerable availability improvement. However, in a case where the cluster head node has very interactive workloads, this type of load profile may impede some performance issues (at the head node). Our plans are to extend both the HA and performance aspects of HA-OSCAR in the coming year by implementing an active-active HA-OSCAR architecture. Furthermore, in future releases, we plan to expand HA-OSCAR to address planned downtime issues. We anticipate one major release on a yearly basis, with minor releases throughout the year as progress continues.

The HA-OSCAR initiative was established to build a highly available version of OSCAR that is not restricted to only HPC applications. HA-OSCAR is also suitable for other types of applications such as Web services and telecom applications. In this article we focused on the installation process; in a following article we plan to describe the various usage models of the HA-OSCAR cluster, for example, an HA Web cluster. If you would like to become involved with the HA-OSCAR initiative, please contact any of the authors.

ASP: Application Service Providers
DHCP: Dynamic Host Configuration Protocol
FCAPS: Fault-management, Configuration, Accounting, Performance, and Security
HA: High Availability
HA-OSCAR: High-Availability OSCAR
HPC: High Performance Computing
ISP: Internet Service Providers
ITU: International Telecommunication Union
LAN: Local Area Network
MAC: Media Access Control
OCG: Open Cluster Group
OSCAR: Open Source Clustering and Application Resources
NFS: Network File System
NIC: Network Interface Card
NTP: Network Time Protocol
R&D: Research and Development
SIS: SystemImager - Image Based Installation and Maintenance Tool and LUI - Resource Based Cluster Installation Tool
SNMP: Simple Network Management Protocol
TFTP: Trivial File Transfer Protocol
Thin-OSCAR: Diskless OSCAR version
TNM: Telecommunication Management Network


  • GNU General Public License:
  • HA Linux Project:
  • Kimberlite Project:
  • Linux Virtual Server Project: www.lin
  • Louisiana Tech University:
  • Oak Ridge National Lab:
  • Open System Lab:
  • Open Cluster Group: www.opencluster
  • OSCAR:
  • SystemImager Project:
  • Scyld:
  • Rocks Clusters:
  • More Stories By Ibrahim Haddad

    Ibrahim Haddad is a member of the management team at The Linux Foundation responsible for technical, legal and compliance projects and initiatives. Prior to that, he ran the Open Source Office at Palm, the Open Source Technology Group at Motorola, and Global Telecommunications Initiatives at The Open Source Development Labs. Ibrahim started his career as a member of the research team at Ericsson Research focusing on advanced research for system architecture of 3G wireless IP networks and on the adoption of open source software in telecom. Ibrahim graduated from Concordia University (Montréal, Canada) with a Ph.D. in Computer Science. He is a Contributing Editor to the Linux Journal. Ibrahim is fluent in Arabic, English and French. He can be reached via

    More Stories By Chokchai Leangsuksun

    Chokchai Leangsuksun is an Associate Professor in Computer Science in the Center for Entrepreneurship and Information Technology (CEnIT) at Louisiana Tech University. Prior to his career in academia, he spent seven years in R&D with Lucent Technologies in system reliability, high availability computing and telecommunication systems.

    More Stories By Stephen L. Scott

    Stephen L. Scott is a senior Research Scientist in the Computer Science and Mathematics Division of Oak Ridge National Laboratory, USA. He is a founding member of OCG and is presently the version 2 release manager. Previously, he held the position of working group chair of the OSCAR.

    More Stories By Tong Liu

    Tong Liu, HPC Advisory Council Cluster Center Manager, is responsible for application performance characterization and benchmarking with high-speed interconnects. Before joining the HPC Advisory Council, he was a senior software design engineer at Hewlett-Packard where he led distributed data warehouse development.

    Prior to HP, he was a systems engineer and advisor in the Scalable Systems Group at Dell Inc. He has published more than 20 publications in the High Performance Computing field. Mr. Liu holds an MS in Computer Science from Louisiana Tech University.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.