Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Ganglia is useful when monitoring nodes of a cluster. Setting up Ganglia on CentOS 7 with a bunch of nodes can be confusing. In this blog, I’ll show you how to setup Ganglia and its web interface properly. Our cluster has 6 nodes connected through a switch.
Cluster Server and Clients
I configured our nodes with the following hostnames using these steps. Our server is:
3.buhpc.com
The clients are:
1.buhpc.com 2.buhpc.com 4.buhpc.com 5.buhpc.com 6.buhpc.com
Installation
On the server, inside the shared folder of our cluster, we will first download the latest version of ganglia. For our cluster, /nfs is the folder with our network file system.
cd /nfs wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.7.2/ganglia-3.7.2.tar.gz
On the server, we will install dependencies and libconfuse.
yum install freetype-devel rpm-build php httpd libpng-devel libart_lgpl-devel python-devel pcre-devel autoconf automake libtool expat-devel rrdtool-devel apr-devel gcc-c++ make pkgconfig -y yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-2.7-7.el7.x86_64.rpm -y yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-devel-2.7-7.el7.x86_64.rpm -y
Now, we will build the rpms from ganglia-3.7.2 on the server.
rpmbuild -tb ganglia-3.7.2.tar.gz
After running rpmbuild, /root/rpmbuild/RPMS/x86_64 contains the generated rpms:
cd /root/rpmbuild/RPMS/x86_64/ yum install *.rpm -y
We will remove gmetad because we do not need it on the clients. Send the rest of the rpms to all the clients’ /tmp folder:
cd /root/rpmbuild/RPMS/x86_64/ rm -rf ganglia-gmetad*.rpm scp *.rpm [email protected]:/tmp scp *.rpm [email protected]:/tmp scp *.rpm [email protected]:/tmp scp *.rpm [email protected]:/tmp scp *.rpm [email protected]:/tmp
SSH onto every client and install the rpms that we will need:
ssh [email protected]#.buhpc.com yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-2.7-7.el7.x86_64.rpm -y yum install https://dl.fedoraproject.org/pub/epel/7/x86_64/l/libconfuse-devel-2.7-7.el7.x86_64.rpm -y yum install /tmp/*.rpm - y
Back on the server, we will adjust the gmetad configuration file:
cd /etc/ganglia vim gmetad.conf
buhpc will be the name of our cluster. Find the following line and add the name of your cluster and ip address. I am using the subdomain instead of the ip address.
data_source "buhpc" 1 3.buhpc.com
Now, we edit the server’s gmond configuration file.
vim /etc/ganglia/gmond.conf
Make sure that these sections have the following and comment any extra lines you see that are within each section.
cluster { name = "buhpc" owner = "unspecified" latlong = "unspecified" url = "unspecified" } udp_send_channel { host = 1.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 2.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 3.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 4.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 5.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 6.buhpc.com port = 8649 ttl = 1 } udp_recv_channel { port = 8649 retry_bind = true }
Now, SSH into each of the clients and do the following individually. On every client:
vim /etc/ganglia/gmond.conf
We will change the clients’ gmond.conf in the same way as the server’s. Make sure that these sections have the following lines and comment any extra lines you see that are within each section.
cluster { name = "buhpc" owner = "unspecified" latlong = "unspecified" url = "unspecified" } udp_send_channel { host = 1.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 2.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 3.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 4.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 5.buhpc.com port = 8649 ttl = 1 } udp_send_channel { host = 6.buhpc.com port = 8649 ttl = 1 } udp_recv_channel { port = 8649 retry_bind = true }
We will start gmond on the clients for monitoring.
chkconfig gmond on systemctl start gmond
Back on the server, we want to install the Ganglia web interface.
cd /nfs wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.1.1%20%28Wien%29/ganglia-web-3.1.1-1.noarch.rpm -O ganglia-web-3.1.1-1.noarach.rpm yum install -y ganglia-web-3.1.1-1.noarch.rpm
Next, we will want to disable SELinux. Change SELINUX inside /etc/sysconfig/selinux from enforcing to disabled. Then, restart the server node.
vim /etc/sysconfig/selinux
SELINUX=disabled
reboot
Now, on the server, we’ll open the correct ports on the firewall.
firewall-cmd --permanent --zone=public --add-service=http firewall-cmd --permanent --zone=public --add-port=8649/udp firewall-cmd --permanent --zone=public --add-port=8649/tcp firewall-cmd --permanent --zone=public --add-port=8651/tcp firewall-cmd --permanent --zone=public --add-port=8652/tcp firewall-cmd --reload
On the server, we will now start httpd, gmetad, and gmond.
chkconfig httpd chkconfig gmetad on chkconfig gmond on systemctl start httpd systemctl start gmetad systemctl start gmond
Visit http://3.buhpc.com/ganglia to see Ganglia’s monitoring. You should see something like this: