GlusterFS 3.6.1 Split Brain resolution
Identify the bricks info:
[root@server1 ~]# gluster volume info images
Volume Name: images
Type: Replicate
Volume ID: e60b5d4b-be1f-4233-b09c-84a97001021f
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1-gluster:/export/images/brick1
Brick2: server2-gluster:/export/images/brick1
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: fixed
cluster.quorum-count: 1
storage.owner-uid: 107
storage.owner-gid: 107
Get the path of the file that is in split-brain:
It can be obtained either by
a) The command gluster volume heal <volname> info split-brain
.
[root@server1 ~]# gluster volume heal images info split-brain
Gathering list of split brain entries on volume images has been successful
Brick server1-gluster:/export/images/brick1
Number of entries: 1024
at path on brick
-----------------------------------
2017-02-08 06:14:56 /srvmsim01v.img
2017-02-08 07:55:53 /srvmmdb02v.img
2017-02-08 07:55:53 /srvmmgw02v.img
etc.
Brick server2-gluster:/export/images/brick1
Number of entries: 1024
at path on brick
-----------------------------------
2017-03-01 09:36:44 /srvmmgw02v.img
2017-03-01 09:37:45 /srvmsim01v.img
2017-03-01 09:37:45 /srvmmdb02v.img
etc.
b) Identify the files for which file operations performed from the client keep failing with Input/Output error.
Close the applications that opened this file from the mount point. In case of VMs, they need to be powered-off.
Decide on the correct copy:
This is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the ‘good copy’ of the file.
getfattr -d -m . -e hex <file-path-on-brick>
.
0x 000003d7 00000001 00000000 | | | | | \_ changelog of directory entries | \_ changelog of metadata \ _ changelog of data
It is also possible that one brick might contain the correct data while the other might contain the correct metadata.
[root@server1 ~]# stat /export/images/brick1/srvmsim01v.img
File: `/export/images/brick1/pny2msim01v.img'
Size: 15228796928 Blocks: 29743528 IO Block: 4096 regular file
Device: 812h/2066d Inode: 116 Links: 2
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-03-01 09:46:23.599184357 -0500
Modify: 2017-03-01 12:23:04.693187960 -0500
Change: 2017-03-01 12:23:21.924583877 -0500
[root@server1 ~]# md5sum /export/images/brick1/srvmsim01v.img
cb21a48ee44309cd0a2bcf6bec4c0f7c /export/images/brick1/srvmsim01v.img
[root@server2 ~]# stat /export/images/brick1/srvmsim01v.img
File: `/export/images/brick1/pny2msim01v.img'
Size: 15228796928 Blocks: 22730808 IO Block: 4096 regular file
Device: 812h/2066d Inode: 115 Links: 2
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-09-16 19:01:35.737767450 -0400
Modify: 2017-03-01 12:23:17.320088786 -0500
Change: 2017-03-01 12:23:34.542382249 -0500
[root@server2 ~]# md5sum /export/images/brick1/srvmsim01v.img
5062e0f3ef1a0a2c36825cd769366276 /export/images/brick1/srvmsim01v.img
[root@server1 ~]# getfattr -d -m . -e hex /export/images/brick1/srvmsim01v.img
getfattr: Removing leading '/' from absolute path names
# file: export/images/brick1/srvmsim01v.img
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.images-client-0=0x000000000000000000000000
trusted.afr.images-client-1=0x015457d20000000000000000
trusted.gfid=0x43304ae0fa284e178e8364b837b30925
[root@server2 ~]# getfattr -d -m . -e hex /export/images/brick1/srvmsim01v.img
getfattr: Removing leading '/' from absolute path names
# file: export/images/brick1/srvmsim01v.img
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.images-client-0=0x000000040000000000000000
trusted.afr.images-client-1=0x000000000000000000000000
trusted.gfid=0x43304ae0fa284e178e8364b837b30925
So, different md5sum output, blocks, inodes, but the same size. Meta data is not corrupted. I decided to keep the first replica.
Reset the relevant extended attribute on the brick(s) that contains the ‘bad copy’ of the file data/metadata using the setfattr command.
setfattr -n <attribute-name> -v <attribute-value> <file-path-on-brick>
[root@server2 ~]#setfattr -n trusted.afr.images-client-0 -v 0x000000000000000000000000 /export/images/brick1/srvmsim01v.img
Trigger self-heal on the file by performing lookup from the client:
ls -l <file-path-on-gluster-mount>
[root@server1 ~]# ls -l /export/images/brick1/srvmsim01v.img
-rw------- 2 qemu qemu 15236399104 Mar 1 17:37 /export/images/brick1/srvmsim01v.img
[root@server1 ~]# ls -l /var/lib/libvirt/images/pny2msim01v.img
-rw------- 1 qemu qemu 15236399104 Mar 1 17:38 /var/lib/libvirt/images/srvmsim01v.img
Links:
https://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/
https://gluster.readthedocs.io/en/latest/Troubleshooting/heal-info-and-split-brain-resolution/ (If you’re under 3.7 or higher))
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md