Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Ganglia is useful when monitoring nodes of a cluster. Setting up Ganglia on CentOS 7 with a bunch of nodes can be confusing. In this blog, I’ll show you how to setup Ganglia and its web interface properly. Our cluster has 6 nodes connected through a switch.

Cluster Server and Clients

I configured our nodes with the following hostnames using these steps. Our server is:

3.buhpc.com

The clients are:

1.buhpc.com
2.buhpc.com
4.buhpc.com
5.buhpc.com
6.buhpc.com

 

Installation

On the server, inside the shared folder of our cluster, we will first download the latest version of ganglia. For our cluster, /nfs is the folder with our network file system.

cd /nfs
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.7.2/ganglia-3.7.2.tar.gz

On the server, we will install dependencies and libconfuse.

yum install freetype-devel rpm-build php httpd libpng-devel libart_lgpl-devel python-devel pcre-devel autoconf automake libtool expat-devel rrdtool-devel apr-devel gcc-c++ make pkgconfig -y
yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-2.7-7.el7.x86_64.rpm -y
yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-devel-2.7-7.el7.x86_64.rpm -y

Now, we will build the rpms from ganglia-3.7.2 on the server.

rpmbuild -tb ganglia-3.7.2.tar.gz

After running rpmbuild, /root/rpmbuild/RPMS/x86_64 contains the generated rpms:

cd /root/rpmbuild/RPMS/x86_64/
yum install *.rpm -y

We will remove gmetad because we do not need it on the clients. Send the rest of the rpms to all the clients’ /tmp folder:

cd /root/rpmbuild/RPMS/x86_64/
rm -rf ganglia-gmetad*.rpm
scp *.rpm [email protected]:/tmp
scp *.rpm [email protected]:/tmp
scp *.rpm [email protected]:/tmp
scp *.rpm [email protected]:/tmp
scp *.rpm [email protected]:/tmp

SSH onto every client and install the rpms that we will need:

ssh [email protected]#.buhpc.com
yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-2.7-7.el7.x86_64.rpm -y
yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-devel-2.7-7.el7.x86_64.rpm -y
yum install /tmp/*.rpm - y

Back on the server, we will adjust the gmetad configuration file:

cd /etc/ganglia
vim gmetad.conf

buhpc will be the name of  our cluster. Find the following line and add the name of your cluster and ip address. I am using the subdomain instead of the ip address.

data_source "buhpc" 1 3.buhpc.com

Now, we edit the server’s gmond configuration file.

vim /etc/ganglia/gmond.conf

Make sure that these sections have the following and comment any extra lines you see that are within each section.

cluster {
  name = "buhpc"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  host = 1.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 2.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 3.buhpc.com
  port = 8649
  ttl = 1
}
udp_send_channel {
  host = 4.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 5.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 6.buhpc.com
  port = 8649
  ttl = 1
}

udp_recv_channel {
  port = 8649
  retry_bind = true
}

Now, SSH into each of the clients and do the following individually. On every client:

vim /etc/ganglia/gmond.conf

We will change the clients’ gmond.conf in the same way as the server’s.  Make sure that these sections have the following lines and comment any extra lines you see that are within each section.

cluster {
  name = "buhpc"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  host = 1.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 2.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 3.buhpc.com
  port = 8649
  ttl = 1
}
udp_send_channel {
  host = 4.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 5.buhpc.com
  port = 8649
  ttl = 1
}

udp_send_channel {
  host = 6.buhpc.com
  port = 8649
  ttl = 1
}

udp_recv_channel {
  port = 8649
  retry_bind = true
}

We will start gmond on the clients for monitoring.

chkconfig gmond on
systemctl start gmond

Back on the server, we want to install the Ganglia web interface.

cd /nfs
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.1.1%20%28Wien%29/ganglia-web-3.1.1-1.noarch.rpm -O ganglia-web-3.1.1-1.noarach.rpm
yum install -y ganglia-web-3.1.1-1.noarch.rpm

Next, we will want to disable SELinux. Change SELINUX inside /etc/sysconfig/selinux from enforcing to disabled. Then, restart the server node.

vim /etc/sysconfig/selinux
SELINUX=disabled
reboot

Now, on the server, we’ll open the correct ports on the firewall.

firewall-cmd --permanent --zone=public --add-service=http
firewall-cmd --permanent --zone=public --add-port=8649/udp
firewall-cmd --permanent --zone=public --add-port=8649/tcp
firewall-cmd --permanent --zone=public --add-port=8651/tcp
firewall-cmd --permanent --zone=public --add-port=8652/tcp
firewall-cmd --reload

On the server, we will now start httpd, gmetad, and gmond.

chkconfig httpd
chkconfig gmetad on
chkconfig gmond on
systemctl start httpd
systemctl start gmetad
systemctl start gmond

Visit http://3.buhpc.com/ganglia to see Ganglia’s monitoring. You should see something like this:

ganglia-home-page