wiki:UbuntuComputingCluster

Building an Ubuntu Computing Cluster

First off, this is a work in progress, and so it will take a while until I complete it. Also I will only write what succeeds. All the hours spent chasing dead ends will not be mentioned. Also, all of the steps described here are tested on Ubuntu Feisty, but should probably work on other releases as well.

(Patrik: I have added some comments marked like this one).

Right, let's get cracking.

Basics

This is what you need:

  • Cluster hardware, i.e. a lot of computers with similar hardware and memory.
  • A good understanding of linux and the following technologies
    • NIS, DHCP, NFS and PXE
  • Patience and/or Coffee/Tee?.

The design I came up with was to use on of the computing nodes as a master node featuring all servers I needed. In other words, one of the computing nodes will have DHCP, NIS, TFTP and NFS servers on it. Sounds a lot I know, but you'll see it's all quite necessary. I will make some assumptions in this small tutorial. For instance I will assume that your cluster will live in a C-class network with a server(the first computing node) at 192.168.0.1. I will assume that you have 4 computing nodes in total, since that is what I'm dealing with here. The first node should have two network cards. One for the interwub and the other one for the local cluster network. Now since we are building a computing cluster I find it proper to empower them with their own swap, because I don't want them signing off when memory is scarce. So for each disk of your nodes you should create a partition table with a primary interface for tmp storage and an extended interface for swap. Choose as you like here but I made a lot of swap.

Getting our hands dirty

Start with installing Ubuntu on one of the nodes using either a cdrom or following the UbuntuViaUSB tutorial. (Patrik and Pontus: We followed Michael's instructions and we managed to build a memory stick that worked perfectly on our desktop computers, but was denied by the cluster. We finally tried this method http://learn.clemsonlinux.org/wiki/Ubuntu:Install_from_USB_drive#The_Quick_Method with a good result. We have absolutely no idea what the problem is/was, but be prepared that this trivial step might take more time than expected). I choose to install only the server since a full desktop isn't really required. Now grab some tee and enjoy the installation of Ubuntu on one of the nodes. (Patrik: The installer wants to automatically detect the network hardware. This step takes a very long time and will give you the impression that something is wrong. Leave the room and return after ten minutes). Finished? Good, now we need to install some software.

sudo apt-get install dhcp3-server tftpd-hpa syslinux nfs-kernel-server initramfs-tools

Then we need to set up some PXE booting prerequisites

cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot
mkdir /var/lib/tftpboot/pxelinux.cfg

Now we need to let the dhcp server know which network to offer the pxe boot image to. So edit the dhcp configuration using vi /etc/dhcp3/dhcpd.conf, and fill it with then information below

ddns-update-style none;

option domain-name "thepcluster.org";
#option domain-name-servers ns1.thepcluster.org;

default-lease-time 600;
max-lease-time 7200;

authoritative;
log-facility local7;

allow booting;
allow bootp;

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.2 192.168.0.4;
  option broadcast-address 192.168.0.255;
  option routers 192.168.0.1;
  option domain-name-servers 192.168.0.1;
  filename "/var/lib/tftpboot/pxelinux.0";
}

(Patrik: This file is SelmaN01-/etc/dhcp3/dhcpd.conf for N01).

In order for any of this to matter we need to tell the tftp server to run, I mean actually run, itself when started. Edit /etc/default/tftpd-hpa and set RUN_DAEMON="yes". After this we need to create a new config file for the pxe thingie so that it will know which kernel to pass to the client. Create the file /var/lib/tftpboot/pxelinux.cfg/default and fill it with the following information.

LABEL linux
KERNEL vmlinuz
APPEND root=/dev/nfs initrd=initrd.img nfsroot=192.168.0.1:/opt/nfsroot ip=dhcp rw

(Patrik: This file is SelmaN01-/var/lib/tftpboot/pxelinux.cfg).

This will tell the clients to mount the root file system via nfs from server 192.168.0.1, our masternode, at /opt/nfsroot. Of course in order for a client to mount a directory over nfs, a server must first share it. So edit your /etc/exports and tell it how to export our nfsroot by entering

/opt/nfsroot 192.168.0.0/24(rw,no_root_squash,async)
/home 192.168.0.0/24(rw,no_root_squash,async)

(Patrik: The NFS daemon has been updated since Michael wrote the original document. A new option has been added, which must be set to a value or the daemon will print warnings during startup. I added subtree_check to /opt/nfsroot and no_subtree_check to /home, but I was only guessing and I recommend you to read the manpages for exports and configure the system according to your own understanding of how NFS works). As you may have noticed we put the home directory in there as well. The reason for this is that we need to let the NIS users access their home directories from each node on the cluster. More about this later. This might be a good time to actually create the clients root file system, and export it.

mkdir -p /opt/nfsroot
exportfs -rv

Remember the configuration we did in pxelinux.cfg? Well it's time to actually create some of the files mentioned there. For starters we shall create the initrd.img file. Since we don't have a dedicated client set up yet we will have to go through some steps to create it. Edit your /etc/initramfs-tools/initramfs.conf and change BOOT to nfs instead of local. Now make an initrd.img and store it along with the kernel image in /var/lib/tftpboot.

mkinitramfs -o /var/lib/tftpboot/initrd.img
cp /boot/vmlinuz-someversion /var/lib/tftpboot/vmlinuz

(Patrik: After you have done this, restore /etc/initramfs-tools/initramfs.conf to whatever it was before you modified it. If initramfs-tools ever gets updated, as in my case, apt-get will try to generate a new initramfs for the login node). Now fill the nfsroot directory.

cd /
cp -axv bin home media root srv tmp var boot etc initrd lib lib64 lost+found mnt sbin sys usr /opt/nfsroot/
cp -axv dev /opt/nfsroot
cp -axv proc /opt/nfsroot

(Patrik: Michael forgot the lib64 symlink in the original instructions. Four hours of reading source code and I found the error. There is also a problem to copy some of the files in proc. The -v flag to cp will give you a hint which files are causing trouble. I just skipped those files). After this you need to make sure that the fstab in the nfsroot knows what to do. Fill /opt/nfsroot/etc/fstab with the information below. This will use the local disks on the client for swap ant tmp storage. Home will be mounted over nfs since we will configure nis users.

proc                    /proc           proc    defaults        0       0
/dev/nfs                /               nfs     defaults        0       0
192.168.0.1:/home       /home           nfs     defaults        0       0
/dev/sda1               /tmp            ext3    defaults        0       1
/dev/sda5               none            swap    sw              0       0

(Patrik: This file is SelmaN01-/opt/nfsroot/etc/fstab for N01)

I hear your inquiring minds cry! Well, the thing is we don't have an actual interface set up on the server yet with an ip of 192.168.0.1, so let's get to it. Fire up vi on /etc/network/interfaces and the new interface. After your work is done the file should look somthing like the example below.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth1
iface eth1 inet dhcp

# The interface for the local cluster network
auto eth0
iface eth0 inet static
        address 192.168.0.1
        netmask 255.255.255.0

I know, I know, setting the eth0 as the local interface might not have been entirely transparent. Deal with it. Anyway, now we have our two interfaces active and our primary servers are set up properly. Before we move on we need to make sure that the client root file system is configured properly. We already did the fstab, but we need to make one crucial adjustment to the interfaces file on the nfsroot. Comment out all interfaces except the lo one. If you allow eth0 or eth1 to be brought up you will loose the nfsroot since the network card was set ut by pxe during the initial boot process.

Setting up NIS server

By now we should be able to boot the nodes from network and use nfsroot as the root file system. Now we need to set up some users since we don't want to mirror the users over the real root and the nfsroot. First we need to make our system a bit more secure, since the default for NIS is to allow the whole world access to its services. We accomplish this by setting up /etc/hosts.allow

ALL: LOCAL
ALL: .thep.lu.se
ALL: 192.168.0.0/24

and /etc/hosts.deny

ALL: ALL

where we only allow users from thep.lu.se and the local network to access the server. To actually install the NIS we issue

sudo apt-get install portmap nis

and choose the domain name. Note that this is not the real domain name, it's just something that NIS will use to identify groups of clients so you can basically choose anything you'd like here. However, we shall stick with thepcluster.org as we did when we configured our local interface for the dhcp server. When the installation is done we have one more security thingie to fix where we edit the /etc/ypserv.securenets and comment out 0.0.0.0. I can't stress the importance of this enough. If you forget then the whole world has access. So fill it with allowed hosts instead like:

host 192.168.0.1
host 192.168.0.2
host 192.168.0.3
host 192.168.0.4

We want the masternode to know that it is a server. We accomplish that be editing /etc/default/nis and setting NISSERVER = master. Also make sure that /etc/default/portmap has the ARGS="-i 127.0.0.1" line commented out. Next edit /etc/yp.conf and add a server line like

domain thepcluster.org server selma.thepcluster.org

(Patrik: I replaced selma.thepcluster.org with the IP-number of the first node. This reminds me to point out that Michael does not mention about how to set up the network and name lookup. I set up /etc/hosts according to the manpage of hosts. Before I did this I had huge problems with sshd). Maybe the masternode isn't called selma, and if so you should enter the name of your masternode here. Issue sudo /usr/lib/yp/ypinit -m to build the database. Don't worry about some of the errors that occur. I didn't and our system works just fine. This is an excellent opportunity to add new users to your system, i.e., adduser monkeyboy etc. When all the users are added issue a make -C /var/yp to propagete the changes through the system.

Setting up NIS client in nfsroot

Setting up NIS on the clients is a picnic. Copy the hosts.allow, hosts.deny and yp.conf from /etc to the nfsroot's etc. Then chroot into the nfsroot by

sudo chroot /opt/nfsroot

and issue

apt-get install nis

andn write the same domain name as before. Then add +::::::, +::: and +:::::::: at the end of /etc/passwd, /etc/group and /etc/shadow respectively.

Configuring TORQUE Resource Manager

There are a few queuing systems out there today, but we chose torque. Mostly for compatibility reasons. Download it from Torque by

wget http://www.clusterresources.com/downloads/torque/torque-2.1.8.tar.gz

and sit back and relax as it downloads. When it's finished you install it a la

tar -zxf torque-2.1.8.tar.gz
cd torque-2.1.8
./configure
make
sudo make install

Torque have to know which user should hold the queue and thus be the admin of the queue, in our case we choose the clusteradmin user. Run

./torque.setup clusteradmin
make packages

from within the torque src folder. This will set up the basic queue and create the package we need to distribute and install on the clients. Let's install the package.

cp torque-package-clients-linux-x86_64.sh /opt/nfsroot/tmp
cp torque-package-mom-linux-x86_64.sh /opt/nfsroot/tmp
sudo chroot /opt/nfsroot
cd /tmp
./torque-package-mom-linux-x86_64.sh --install
./torque-package-clients-linux-x86_64.sh --install

(Patrik and Simon: Michael forgot the --install flags which we have nod added to the instructions.) Edit /var/spool/torque/server_name and make sure that it contains the name of the first computing node, i.e., selma in our case. After that exit the chroot typing, you got it, exit. Now we're back at the real root, and we want to tell Torque what computing nodes to use. We accomplish this by filling /var/spool/torque/server_priv/nodes with the information

n01.thepcluster.org np=8
n02.thepcluster.org np=8
n03.thepcluster.org np=8
n04.thepcluster.org np=8

where np=8 tells the server that the nodes has 8 processors. Ok, it turns out that it's not enough setting the server_name file in the torque directory. We also have to tell the mom where the server is. We do that by writing

$pbsserver      n01               # note: hostname running pbs_server
$logevent       255               # bitmap of which events to log

in the /var/spool/torque/mom_priv/config file. Modify the /opt/nfsroot/etc/rc.local to start the pbs_mom daemon by typing pbs_mom before the exit 0; command! An example of the rc.local file is shown below.

# mount -a
# rm -rf /tmp/torque
# cp -a /var/spool/torque_orig /tmp/torque
/usr/local/sbin/pbs_mom
exit 0

(Patrik and Simon: We commented away the first three lines as they only caused errors (and seem to be redudant). We have no idea why Michael put them there). Now fire up all the computing nodes. When they start we try the torque server out to make sure that our queue is in order. So terminate and restart the server, check its configuration,

qterm
pbs_server
qstat -q
qmgr -c 'p s'

status of all computing nodes and finally submit a job to the queue.

pbsnodes -a
echo "sleep 60" | qsub
qstat

If everything worked out all right, we are ready to fire up the scheduler by issuing

pbs_sched

and relax. To finish up we make sure that the server and scheduler starts automatically at boot time by adding the following section to /etc/rc.local.

/usr/local/sbin/pbs_server
/usr/local/sbin/pbs_sched
/usr/local/bin/qmgr -c 'set server query_other_jobs=true'
/usr/local/sbin/pbs_mom

That's it! We have a full fledged computing cluster booting from network by PXE, controlling users by NIS and scheduling jobs by Torque.

Bonus stuff

I followed this guide to make it possible for the nodes to reach the outside world. Necessary for Matlab to work.

Last modified 8 years ago Last modified on Nov 5, 2009, 3:41:10 PM