GlusterFS is a scale-out network-attached storage file system. In this tutorial, we’ll be setting up GlusterFS on a cluster with CentOS 7.  Our cluster has 6 nodes connected through a switch. I’ll be using all 6 nodes as servers for distributed replicated storage with opportunity for more nodes to be clients that can access files from the GlusterFS servers.

How does GlusterFS work

In a GlusterFS, servers are used to store data in a distributed manner, and clients can access that data. Let’s explain with our 6 node example. I’m using 3 replicas, so we have pairs of 2 that compose each replica. When we use 6 nodes as servers, nodes 1 and 2 (replication1), nodes 3 and 4 (replication2), and nodes 5 and 6 (replication3) will mirror each other.

Sometimes, files are retrieved from replication1, and other times, replication2, and other times, replication3. If you think about the entire storage file system, replication1, replication2, and replication3 combine into one larger storage system (distribution). The charm of GlusterFS is file location calculation without lookup. Less bottleneck.

gluster-distributed-illustration

 

Cluster Servers

I configured our nodes with the following hostnames using these steps. Our servers are:

3.buhpc.com
1.buhpc.com
2.buhpc.com
4.buhpc.com
5.buhpc.com
6.buhpc.com

Setting up the GlusterFS Servers

Update the yum repo and epel.

yum update -y
wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo
yum install http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Install the GlusterFS server and Samba.

yum install glusterfs-server samba -y

We will make a directory on every server node, which will be the location of the mount point.

mkdir -p /gfs/glustervol

On every server node, we want to start the gluster service.

systemctl enable glusterd.service && systemctl start glusterd.service

On every server node, if you have firewalld running, we want to open the correct ports.

firewall-cmd --zone=public --add-port=24009/tcp --permanent
firewall-cmd --zone=public --add-port=24007/tcp --permanent
firewall-cmd --zone=public --add-service=nfs --add-service=samba --add-service=samba-client --permanent
firewall-cmd --zone=public --add-port=111/tcp --add-port=139/tcp --add-port=445/tcp --add-port=965/tcp --add-port=2049/tcp --add-port=38465-38469/tcp --add-port=631/tcp --add-port=111/udp --add-port=963/udp --add-port=49152-49251/tcp  --permanent
firewall-cmd --reload

You should see success on every added firewalld rule.

 

The Main GlusterFS Server

For our setup, we chose our 3.buhpc.com node to be our main server that connects all the other servers. Choose one node as the main server and connect the peers:

gluster peer probe 1.buhpc.com
gluster peer probe 2.buhpc.com
gluster peer probe 4.buhpc.com
gluster peer probe 5.buhpc.com
gluster peer probe 6.buhpc.com

We can check if we successfully added all the peers to our main server.

gluster peer status
Number of Peers: 5
Hostname: cumm024-0b08-dhcp07.bu.edu
Uuid: b7c48a28-2229-49f5-af28-41cd9cce2fe6
State: Peer in Cluster (Connected)
Other names:
2.buhpc.com

Hostname: 1.buhpc.com
Uuid: 5eacbc2e-6490-47bb-b4fd-9a2575db941f
State: Peer in Cluster (Connected)

Hostname: 4.buhpc.com
Uuid: 240282d8-a4cb-4bbc-8ca6-00a3383a0c48
State: Peer in Cluster (Connected)

Hostname: 5.buhpc.com
Uuid: 4edc641b-dbcb-415f-9618-718087004adc
State: Peer in Cluster (Connected)

Hostname: 6.buhpc.com
Uuid: 24364805-7cbe-405d-adcf-a6334f9f6e40
State: Peer in Cluster (Connected)

Now, we will create the GlusterFS volume. We are naming it glustervol. We are using 3 replicas to pair the 6 node servers that we have.

gluster volume create glustervol replica 3 transport tcp 3.buhpc.com:/gfs/glustervol 1.buhpc.com:/gfs/glustervol 2.buhpc.com:/gfs/glustervol 4.buhpc.com:/gfs/glustervol 5.buhpc.com:/gfs/glustervol 6.buhpc.com:/gfs/glustervol force

If all goes well, we can start the gluster volume.

gluster volume start glustervol
gluster volume info all
Volume Name: glustervol
Type: Distributed-Replicate
Volume ID: ed995f44-6649-48d0-b5a8-7e87c3568473
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 3.buhpc.com:/gfs/glustervol
Brick2: 1.buhpc.com:/gfs/glustervol
Brick3: 2.buhpc.com:/gfs/glustervol
Brick4: 4.buhpc.com:/gfs/glustervol
Brick5: 5.buhpc.com:/gfs/glustervol
Brick6: 6.buhpc.com:/gfs/glustervol
Options Reconfigured:
performance.readdir-ahead: on

If we go into /gfs/glustervol and create a file, it appears on all the server nodes.

cd /gfs/glustervol
touch slothparadise.txt
ssh root@1.buhpc.com
ls /gfs/glustervol
slothparadise.txt

 

Connecting Clients

The servers store the data efficiently in a distributed replicated manner. Now, we can add clients to be able to access those files. Let’s say that we had a node 7, 7.buhpc.com. Here’s how we would add node 7 as a client to the 6 node servers.

yum install glusterfs glusterfs-fuse attr -y

After installing the necessary glusterfs dependencies for clients, we can mount glusterfs volume onto a folder of node 7. We mount with type, glusterfs, and access the main server’s volume name, which we named glustervol, and we mount it into /mnt on node 7.

mount -t glusterfs 3.buhpc.com:/glustervol /mnt/
ls /mnt
slothparadise.txt

 

Deleting Gluster Volume

You may want to delete a gluster volume in the future. To find what gluster volumes you have:

gluster volume info all

To stop and delete the gluster volume:

gluster volume stop nameOfVolume
gluster volume delete nameOfVolume