GlusterFS is a scale-out network-attached storage file system. In this tutorial, we’ll be setting up GlusterFS on a cluster with CentOS 7. Our cluster has 6 nodes connected through a switch. I’ll be using all 6 nodes as servers for distributed replicated storage with opportunity for more nodes to be clients that can access files from the GlusterFS servers.
How does GlusterFS work
In a GlusterFS, servers are used to store data in a distributed manner, and clients can access that data. Let’s explain with our 6 node example. I’m using 3 replicas, so we have pairs of 2 that compose each replica. When we use 6 nodes as servers, nodes 1 and 2 (replication1), nodes 3 and 4 (replication2), and nodes 5 and 6 (replication3) will mirror each other.
Sometimes, files are retrieved from replication1, and other times, replication2, and other times, replication3. If you think about the entire storage file system, replication1, replication2, and replication3 combine into one larger storage system (distribution). The charm of GlusterFS is file location calculation without lookup. Less bottleneck.
Cluster Servers
I configured our nodes with the following hostnames using these steps. Our servers are:
3.buhpc.com 1.buhpc.com 2.buhpc.com 4.buhpc.com 5.buhpc.com 6.buhpc.com
Setting up the GlusterFS Servers
Update the yum repo and epel.
yum update -y wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo yum install http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Install the GlusterFS server and Samba.
yum install glusterfs-server samba -y
We will make a directory on every server node, which will be the location of the mount point.
mkdir -p /gfs/glustervol
On every server node, we want to start the gluster service.
systemctl enable glusterd.service && systemctl start glusterd.service
On every server node, if you have firewalld running, we want to open the correct ports.
firewall-cmd --zone=public --add-port=24009/tcp --permanent firewall-cmd --zone=public --add-port=24007/tcp --permanent firewall-cmd --zone=public --add-service=nfs --add-service=samba --add-service=samba-client --permanent firewall-cmd --zone=public --add-port=111/tcp --add-port=139/tcp --add-port=445/tcp --add-port=965/tcp --add-port=2049/tcp --add-port=38465-38469/tcp --add-port=631/tcp --add-port=111/udp --add-port=963/udp --add-port=49152-49251/tcp --permanent firewall-cmd --reload
You should see success on every added firewalld rule.
The Main GlusterFS Server
For our setup, we chose our 3.buhpc.com node to be our main server that connects all the other servers. Choose one node as the main server and connect the peers:
gluster peer probe 1.buhpc.com gluster peer probe 2.buhpc.com gluster peer probe 4.buhpc.com gluster peer probe 5.buhpc.com gluster peer probe 6.buhpc.com
We can check if we successfully added all the peers to our main server.
gluster peer status
Number of Peers: 5 Hostname: cumm024-0b08-dhcp07.bu.edu Uuid: b7c48a28-2229-49f5-af28-41cd9cce2fe6 State: Peer in Cluster (Connected) Other names: 2.buhpc.com Hostname: 1.buhpc.com Uuid: 5eacbc2e-6490-47bb-b4fd-9a2575db941f State: Peer in Cluster (Connected) Hostname: 4.buhpc.com Uuid: 240282d8-a4cb-4bbc-8ca6-00a3383a0c48 State: Peer in Cluster (Connected) Hostname: 5.buhpc.com Uuid: 4edc641b-dbcb-415f-9618-718087004adc State: Peer in Cluster (Connected) Hostname: 6.buhpc.com Uuid: 24364805-7cbe-405d-adcf-a6334f9f6e40 State: Peer in Cluster (Connected)
Now, we will create the GlusterFS volume. We are naming it glustervol. We are using 3 replicas to pair the 6 node servers that we have.
gluster volume create glustervol replica 3 transport tcp 3.buhpc.com:/gfs/glustervol 1.buhpc.com:/gfs/glustervol 2.buhpc.com:/gfs/glustervol 4.buhpc.com:/gfs/glustervol 5.buhpc.com:/gfs/glustervol 6.buhpc.com:/gfs/glustervol force
If all goes well, we can start the gluster volume.
gluster volume start glustervol
gluster volume info all
Volume Name: glustervol Type: Distributed-Replicate Volume ID: ed995f44-6649-48d0-b5a8-7e87c3568473 Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 3.buhpc.com:/gfs/glustervol Brick2: 1.buhpc.com:/gfs/glustervol Brick3: 2.buhpc.com:/gfs/glustervol Brick4: 4.buhpc.com:/gfs/glustervol Brick5: 5.buhpc.com:/gfs/glustervol Brick6: 6.buhpc.com:/gfs/glustervol Options Reconfigured: performance.readdir-ahead: on
If we go into /gfs/glustervol and create a file, it appears on all the server nodes.
cd /gfs/glustervol touch slothparadise.txt ssh [email protected] ls /gfs/glustervol
slothparadise.txt
Connecting Clients
The servers store the data efficiently in a distributed replicated manner. Now, we can add clients to be able to access those files. Let’s say that we had a node 7, 7.buhpc.com. Here’s how we would add node 7 as a client to the 6 node servers.
yum install glusterfs glusterfs-fuse attr -y
After installing the necessary glusterfs dependencies for clients, we can mount glusterfs volume onto a folder of node 7. We mount with type, glusterfs, and access the main server’s volume name, which we named glustervol, and we mount it into /mnt on node 7.
mount -t glusterfs 3.buhpc.com:/glustervol /mnt/
ls /mnt
slothparadise.txt
Deleting Gluster Volume
You may want to delete a gluster volume in the future. To find what gluster volumes you have:
gluster volume info all
To stop and delete the gluster volume:
gluster volume stop nameOfVolume gluster volume delete nameOfVolume