{"id":696,"date":"2017-03-02T05:43:42","date_gmt":"2017-03-01T22:43:42","guid":{"rendered":"http:\/\/blog.trichev.com\/?p=696"},"modified":"2017-08-10T09:08:47","modified_gmt":"2017-08-10T02:08:47","slug":"glusterfs-3-6-1-split-brain-resolution","status":"publish","type":"post","link":"https:\/\/trichev.com\/blog\/2017\/03\/02\/glusterfs-3-6-1-split-brain-resolution\/","title":{"rendered":"GlusterFS 3.6.1 Split Brain resolution"},"content":{"rendered":"<p>Identify the bricks info:<br \/>\n<code>[root@server1 ~]# gluster volume info images<\/code><\/p>\n<p><code>Volume Name: images<br \/>\nType: Replicate<br \/>\nVolume ID: e60b5d4b-be1f-4233-b09c-84a97001021f<br \/>\nStatus: Started<br \/>\nNumber of Bricks: 1 x 2 = 2<br \/>\nTransport-type: tcp<br \/>\nBricks:<br \/>\nBrick1: server1-gluster:\/export\/images\/brick1<br \/>\nBrick2: server2-gluster:\/export\/images\/brick1<br \/>\nOptions Reconfigured:<br \/>\ndiagnostics.count-fop-hits: on<br \/>\ndiagnostics.latency-measurement: on<br \/>\nperformance.quick-read: off<br \/>\nperformance.read-ahead: off<br \/>\nperformance.io-cache: off<br \/>\nperformance.stat-prefetch: off<br \/>\ncluster.eager-lock: enable<br \/>\nnetwork.remote-dio: enable<br \/>\ncluster.quorum-type: fixed<br \/>\ncluster.quorum-count: 1<br \/>\nstorage.owner-uid: 107<br \/>\nstorage.owner-gid: 107<\/code><\/p>\n<p>Get the path of the file that is in split-brain:<br \/>\nIt can be obtained either by<br \/>\na) The command <code>gluster volume heal &lt;volname&gt; info split-brain<\/code>.<\/p>\n<p><code>[root@server1 ~]# gluster volume heal images info split-brain<br \/>\nGathering list of split brain entries on volume images has been successful<\/code><\/p>\n<p><code>Brick server1-gluster:\/export\/images\/brick1<br \/>\nNumber of entries: 1024<br \/>\nat path on brick<br \/>\n-----------------------------------<br \/>\n2017-02-08 06:14:56 \/srvmsim01v.img<br \/>\n2017-02-08 07:55:53 \/srvmmdb02v.img<br \/>\n2017-02-08 07:55:53 \/srvmmgw02v.img<br \/>\netc.<\/code><\/p>\n<p><code>Brick server2-gluster:\/export\/images\/brick1<br \/>\nNumber of entries: 1024<br \/>\nat path on brick<br \/>\n-----------------------------------<br \/>\n2017-03-01 09:36:44 \/srvmmgw02v.img<br \/>\n2017-03-01 09:37:45 \/srvmsim01v.img<br \/>\n2017-03-01 09:37:45 \/srvmmdb02v.img<br \/>\netc.<\/code><\/p>\n<p>b) Identify the files for which file operations performed from the client keep failing with Input\/Output error.<\/p>\n<p>Close the applications that opened this file from the mount point. In case of VMs, they need to be powered-off.<\/p>\n<p>Decide on the correct copy:<br \/>\nThis is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the &#8216;good copy&#8217; of the file.<br \/>\n<code>getfattr -d -m . -e hex &lt;file-path-on-brick&gt;<\/code>.<\/p>\n<pre>0x 000003d7 00000001 00000000\r\n\u00a0\u00a0\u00a0   |\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0 | \u00a0 \u00a0 \u00a0\u00a0 |\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 \u00a0 \u00a0|\u00a0\u00a0\u00a0\u00a0 \u00a0 \u00a0 \\_ changelog of directory entries\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0  \\_ changelog of metadata\r\n\u00a0\u00a0\u00a0\u00a0\u00a0  \\ _ changelog of data<\/pre>\n<p>It is also possible that one brick might contain the correct data while the other might contain the correct metadata.<\/p>\n<p><code>[root@server1 ~]# stat \/export\/images\/brick1\/srvmsim01v.img<br \/>\nFile: `\/export\/images\/brick1\/pny2msim01v.img'<br \/>\nSize: 15228796928 Blocks: 29743528 IO Block: 4096 regular file<br \/>\nDevice: 812h\/2066d Inode: 116 Links: 2<br \/>\nAccess: (0600\/-rw-------) Uid: ( 0\/ root) Gid: ( 0\/ root)<br \/>\nAccess: 2017-03-01 09:46:23.599184357 -0500<br \/>\nModify: 2017-03-01 12:23:04.693187960 -0500<br \/>\nChange: 2017-03-01 12:23:21.924583877 -0500<\/code><\/p>\n<p><code>[root@server1 ~]# md5sum \/export\/images\/brick1\/srvmsim01v.img<br \/>\ncb21a48ee44309cd0a2bcf6bec4c0f7c \/export\/images\/brick1\/srvmsim01v.img<\/code><\/p>\n<p><code>[root@server2 ~]# stat \/export\/images\/brick1\/srvmsim01v.img<br \/>\nFile: `\/export\/images\/brick1\/pny2msim01v.img'<br \/>\nSize: 15228796928 Blocks: 22730808 IO Block: 4096 regular file<br \/>\nDevice: 812h\/2066d Inode: 115 Links: 2<br \/>\nAccess: (0600\/-rw-------) Uid: ( 0\/ root) Gid: ( 0\/ root)<br \/>\nAccess: 2015-09-16 19:01:35.737767450 -0400<br \/>\nModify: 2017-03-01 12:23:17.320088786 -0500<br \/>\nChange: 2017-03-01 12:23:34.542382249 -0500<\/code><\/p>\n<p><code>[root@server2 ~]# md5sum \/export\/images\/brick1\/srvmsim01v.img<br \/>\n5062e0f3ef1a0a2c36825cd769366276 \/export\/images\/brick1\/srvmsim01v.img<\/code><\/p>\n<p><code>[root@server1 ~]# getfattr -d -m . -e hex \/export\/images\/brick1\/srvmsim01v.img<br \/>\ngetfattr: Removing leading '\/' from absolute path names<br \/>\n# file: export\/images\/brick1\/srvmsim01v.img<br \/>\ntrusted.afr.dirty=0x000000000000000000000000<br \/>\ntrusted.afr.images-client-0=0x000000000000000000000000<br \/>\ntrusted.afr.images-client-1=0x015457d20000000000000000<br \/>\ntrusted.gfid=0x43304ae0fa284e178e8364b837b30925<\/code><\/p>\n<p><code>[root@server2 ~]# getfattr -d -m . -e hex \/export\/images\/brick1\/srvmsim01v.img<br \/>\ngetfattr: Removing leading '\/' from absolute path names<br \/>\n# file: export\/images\/brick1\/srvmsim01v.img<br \/>\ntrusted.afr.dirty=0x000000000000000000000000<br \/>\ntrusted.afr.images-client-0=0x000000040000000000000000<br \/>\ntrusted.afr.images-client-1=0x000000000000000000000000<br \/>\ntrusted.gfid=0x43304ae0fa284e178e8364b837b30925<\/code><\/p>\n<p>So, different md5sum output, blocks, inodes, but the same size. Meta data is not corrupted. I decided to keep the first replica.<\/p>\n<p>Reset the relevant extended attribute on the brick(s) that contains the &#8216;bad copy&#8217; of the file data\/metadata using the setfattr command.<br \/>\n<code>setfattr -n &lt;attribute-name&gt; -v &lt;attribute-value&gt; &lt;file-path-on-brick&gt;<\/code><\/p>\n<p><code>[root@server2 ~]#setfattr -n trusted.afr.images-client-0 -v 0x000000000000000000000000 \/export\/images\/brick1\/srvmsim01v.img<\/code><\/p>\n<p>Trigger self-heal on the file by performing lookup from the client:<br \/>\n<code>ls -l &lt;file-path-on-gluster-mount&gt;<\/code><\/p>\n<p><code>[root@server1 ~]# ls -l \/export\/images\/brick1\/srvmsim01v.img<br \/>\n-rw------- 2 qemu qemu 15236399104 Mar 1 17:37 \/export\/images\/brick1\/srvmsim01v.img<br \/>\n[root@server1 ~]# ls -l \/var\/lib\/libvirt\/images\/pny2msim01v.img<br \/>\n-rw------- 1 qemu qemu 15236399104 Mar 1 17:38 \/var\/lib\/libvirt\/images\/srvmsim01v.img<\/code><\/p>\n<p>Links:<br \/>\n<a href=\"https:\/\/gluster.readthedocs.io\/en\/latest\/Troubleshooting\/split-brain\/\">https:\/\/gluster.readthedocs.io\/en\/latest\/Troubleshooting\/split-brain\/<\/a><br \/>\n<a href=\"https:\/\/gluster.readthedocs.io\/en\/latest\/Troubleshooting\/heal-info-and-split-brain-resolution\/\">https:\/\/gluster.readthedocs.io\/en\/latest\/Troubleshooting\/heal-info-and-split-brain-resolution\/ (If you&#8217;re under 3.7 or higher))<\/a><br \/>\n<a href=\"https:\/\/github.com\/gluster\/glusterfs\/blob\/master\/doc\/debugging\/split-brain.md\">https:\/\/github.com\/gluster\/glusterfs\/blob\/master\/doc\/debugging\/split-brain.md<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Identify the bricks info: [root@server1 ~]# gluster volume info images Volume Name: images Type: Replicate Volume ID: e60b5d4b-be1f-4233-b09c-84a97001021f Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: server1-gluster:\/export\/images\/brick1 Brick2: server2-gluster:\/export\/images\/brick1 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: fixed [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[231],"tags":[212,220,32,28,14,11],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/696"}],"collection":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/comments?post=696"}],"version-history":[{"count":9,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/696\/revisions"}],"predecessor-version":[{"id":706,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/696\/revisions\/706"}],"wp:attachment":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/media?parent=696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/categories?post=696"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/tags?post=696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}