Monday, February 15, 2010

DRBD Configuration on linux

High-Availability with Fedora, DRBD, Heartbeat and Mon [and Xen]

Prerequisites:
1. You must not be an idiot.
2. make, gcc, the glibc development libraries, and the flex scanner generator must be installed on your target systems.

Now that we’ve gotten that out of the way, let’s get started. Remember to perform each step on BOTH your machines.
Do everything as your own user, not as root, unless otherwise noted.

DRBD:
This was a pain in the ass. I followed Chapter 4 of the DRBD User Guide, somewhat.

First, we need the kernel headers:

sudo yum install kernel-devel

Next, we need to obtain the kernel sources:

sudo yum install yum-utils
cd /tmp
yumdownloader --source kernel

Install the kernel source:

sudo rpm -ivh kernel-2.6.24.3-50.fc8.src.rpm

Ignore any warnings about non-existent user/group.

Prepare the kernel source to be usable:

sudo rpmbuild -bp --target=$(uname -m) /usr/src/redhat/SPECS/kernel.spec

Download and extract the DRBD sources in, let’s pick an arbitrary directory, /tmp:

cd /tmp
wget http://oss.linbit.com/drbd/drbd-8.2.5.tar.gz
tar -xzf drbd-8.2.5.tar.gz

Build DRBD for the currently running kernel:

[vic@ares:~$] cd /tmp
[vic@ares:/tmp$] cd drbd-8.2.5
[vic@ares:/tmp/drbd-8.2.5$] cd drbd
[vic@ares:/tmp/drbd-8.2.5/drbd$] make clean all

You will now find a file called drbd.ko in this directory. You can interrogate it to make sure everything’s okay:

[vic@ares:/tmp/drbd-8.2.5/drbd$] /sbin/modinfo drbd.ko
filename: drbd.ko
alias: block-major-147-*
license: GPL
description: drbd - Distributed Replicated Block Device v8.2.5
author: Philipp Reisner
, Lars Ellenberg
srcversion: BC9E03E6896BF68FDE41F44
depends:
vermagic: 2.6.24.3-50.fc8 SMP mod_unload
parm: minor_count:Maximum number of drbd devices (1-255) (int)
parm: allow_oos:DONT USE! (bool)
parm: enable_faults:int
parm: fault_rate:int
parm: fault_count:int
parm: fault_devs:int
parm: trace_level:int
parm: trace_type:int
parm: trace_devs:int
parm: usermode_helper:string

What I did next was build RPMs to install, instead of doing just a make install from /tmp/drbd-8.2.5 since that didn’t work out for me.

[vic@ares:/tmp/drbd-8.2.5$] make rpm
[vic@ares:/tmp/drbd-8.2.5$] cd dist/RPMS/x86_64/
[vic@ares:/tmp/drbd-8.2.5/dist/RPMS/x86_64$] ll
total 1.2M
-rw-r--r-- 1 vic vic 213K 2008-04-01 12:28 drbd-8.2.5-3.x86_64.rpm
-rw-r--r-- 1 vic vic 122K 2008-04-01 12:28 drbd-debuginfo-8.2.5-3.x86_64.rpm
-rw-r--r-- 1 vic vic 834K 2008-04-01 12:28 drbd-km-2.6.24.3_50.fc8-8.2.5-3.x86_64.rpm

Sweet sunshine. Install them.

[vic@ares:/tmp/drbd-8.2.5/dist/RPMS/x86_64$] sudo rpm -ivh *.rpm

Now we need to configure DRBD. See Chapter 5 of the DRBD user guide. This is what my /etc/drbd.conf looks like, dependent on the fact that my two machines are hooked up in the back to a gigabit network. YMMV:

global {
usage-count yes;
}

common {
syncer { rate 40M; }
protocol C;
}

resource r0 {
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f"
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
}

net {
cram-hmac-alg "sha1";
shared-secret "FooFunFactory";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 100M;
al-extents 257;
}

on ares.bluethings.net {
device /dev/drbd0;
disk /dev/sda5;
address 172.16.6.2:7788;
flexible-meta-disk internal;
}

on mars.bluethings.net {
device /dev/drbd0;
disk /dev/sda5;
address 172.16.6.3:7788;
meta-disk internal;
}
}

Now we need to prepare and enable our resources:

[vic@ares:/tmp/drbd-8.2.5/drbd$] sudo su
[root@ares:~#] drbdadm create-md r0
md_offset 107372765184
al_offset 107372732416
bm_offset 107369455616

Found ext3 filesystem which uses 104852984 kB
current configuration leaves usable 104852984 kB

==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

New drbd meta data block sucessfully created.
[root@ares:~#] drbdadm attach r0
[root@ares:~#] drbdadm connect r0
[root@ares:~#] cat /proc/drbd
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by vic@ares.bluethings.net, 2008-04-01 01:35:36
0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

Okay, everything is working so far.

Now, we need to perform the INITIAL FULL SYNCHRONIZATION. Type this only on your primary node.

[root@ares:~#] drbdadm -- --overwrite-data-of-peer primary r0
[root@ares:~#] service drbd start
Starting DRBD resources: [ s(r0) ].

You can monitor the progress of this syncing using cat /proc/drbd

Okay, let’s create the filesystem on our primary node.

[root@ares:~#] mkfs.ext3 /dev/drbd0

This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@ares:~#] tune2fs -c -1 -i 0 /dev/drbd0
tune2fs 1.40.4 (31-Dec-2007)
Setting maximal mount count to -1
Setting interval between checks to 0 seconds

That won’t work on the secondary node. If you feel like trying anyway:

[root@mars:~#] mkfs.ext3 /dev/drbd0
mke2fs 1.40.4 (31-Dec-2007)
mkfs.ext3: Wrong medium type while trying to determine filesystem size

Now, we create the mount points and mount the filesystem on our primary node:

[root@ares:/#] mkdir /data
[root@ares:/#] mount -o rw /dev/drbd0 /data

And create the mount point on the secondary node:

[root@mars:/#] mkdir /data

At this point, you kinda need to take my word for it that DRBD is working. But how do we know if it’ll really do its job when the time comes? Well, you could test it manually by creating a file in /data, then umounting /data and making ares a secondary node and then promoting mars to a primary node. Oh hell, I’ll show you.

[root@ares:/#] echo "I'm a genie in a bottle baby." >> /data/test_file
[root@ares:/#] cat /data/test_file
I'm a genie in a bottle baby.
[root@ares:/#] service drbd status
drbd driver loaded OK; device status:
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by vic@ares.bluethings.net, 2008-04-01 01:35:36
m:res cs st ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /xen ext3
[root@ares:/#] umount /xen
[root@ares:/#] drbdadm secondary r0
[root@ares:/#] service drbd status
drbd driver loaded OK; device status:
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by vic@ares.bluethings.net, 2008-04-01 01:35:36
m:res cs st ds p mounted fstype
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C

And on mars:

[root@mars:/data#] drbdadm primary r0
[root@mars:/data#] mount -o rw /dev/drbd0 /data
[root@mars:/data#] service drbd status
drbd driver loaded OK; device status:
version: 8.2.5 (api:88/proto:86-88)
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by vic@mars.bluethings.net, 2008-04-01 01:37:37
m:res cs st ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /xen ext3
[root@mars:/#] cat /data/test_file
I'm a genie in a bottle baby.

There we go. Now to install and configure heartbeat. Lucky for us, the documentation for this infinitely better than that for DRBD. See Getting Started at linux-ha.org.

Heartbeat:

[vic@ares:~$] sudo yum install heartbeat
[vic@ares:~$] sudo su
[root@ares:/home/vic#] cd /etc/ha.d
[root@ares:/etc/ha.d#] ll
total 24K
-rwxr-xr-x 1 root root 745 2008-03-05 20:20 harc
drwxr-xr-x 2 root root 4.0K 2008-04-01 00:32 rc.d
-rw-r--r-- 1 root root 692 2008-03-05 20:20 README.config
drwxr-xr-x 2 root root 4.0K 2008-04-01 01:37 resource.d
-rw-r--r-- 1 root root 7.1K 2008-03-05 20:20 shellfuncs

Now where the hell is ha.cf? Looking inside README.config shows:

The good news is that sample versions of these files may be found in
the documentation directory (providing you installed the documentation).

If you installed heartbeat using rpm packages then
this command will show you where they are on your system:
rpm -q heartbeat -d

Alternatively, we can use locate to find the files and copy them to /etc/ha.d:

[root@ares:/etc/ha.d#] updatedb
[root@ares:/etc/ha.d#] locate ha.cf
/usr/share/doc/heartbeat-2.1.3/ha.cf
[root@ares:/etc/ha.d#] cp /usr/share/doc/heartbeat-2.1.3/ha.cf .
[root@ares:/etc/ha.d#] cp /usr/share/doc/heartbeat-2.1.3/haresources .
[root@ares:/etc/ha.d#] cp /usr/share/doc/heartbeat-2.1.3/authkeys .

And then pretty much configure ha.cf like the Linux HA Configuring ha.cf tells you to do. You can also checkout howtoforge. I didn’t bother with the watchdog or the softdog stuff, and I set my bcast to eth1, since that’s what my internal network runs on.

When you are done configuring haresources, ha.cf and authkeys like Linux HA and howtoforge tell you, copy all three files over to the other node so that they are exactly the same. To test failover, kill heartbeat on your primary machine and your secondary machine will take over the requests. An easy way to test this would be to set up httpd to listen on a certain IP and have it print out the results of uname -a. Obviously the results will be different depending on which machine is serving the requests.

For your reference, this is my /etc/ha.d/haresources on both machines:

ares.bluethings.net \
IPaddr::xx.yy.zz.w4/30/eth0/xx.yy.zz.w9 \
IPaddr::xx.yy.zz.w5/30/eth0/xx.yy.zz.w9 \
IPaddr::xx.yy.zz.w6/30/eth0/xx.yy.zz.w9 \
IPaddr::xx.yy.zz.w7/30/eth0/xx.yy.zz.w9 \
IPaddr::xx.yy.zz.w8/30/eth0/xx.yy.zz.w9 \
drbddisk::wwwroot Filesystem::/dev/drbd0::/wwwroot::ext3::rw \
drbddisk::sqlroot Filesystem::/dev/drbd1::/sqlroot::ext3::rw \
drbddisk::mailroot Filesystem::/dev/drbd2::/mailroot::ext3::rw \
mysqld httpd dkimproxy cyrus-imapd saslauthd postfix mon

And finally, I set up mon. The thing about heartbeat is that it’ll do a failover if your machine dies, but heartbeat has no way of knowing when mysqld or httpd or postfix or one of your other crucial services die. This is where mon comes in. If my httpd service dies, mon will shoot me a couple alert emails and kill heartbeat, thus causing the failover to happen. It’s beautiful, really. Installing and configuring mon is prettty easy from the README and INSTALL files so I won’t go into much detail. Make sure mon is the last thing you start in your haresources file (or at least make sure it comes AFTER all your monitored services) or otherwise you’re going to encounter a race condition where mon starts up before say mysqld, sees that mysqld is not running, and shuts down heartbeat.

Any questions? No? Great. Have a happy high-availability cluster.

Late addition:
If you decide to use Xen, then all you need to do is do all of the above on one machine, use the DRBD partition to store the Xen images in, and then it’d be possible to do a live migration of sorts when heartbeat switches over to the other machine.

Update on 7/28/2008: Some of you have contacted me asking what the ‘ll’ command I’m typing is. On my system, ‘ll’ is an alias for ‘ls -lh’.

References:
http://www.mjmwired.net/resources/mjm-fedora-f8.html#kernelheaders
http://www.drbd.org/users-guide
https://services.ibb.gatech.edu/wiki/index.php/Howto:Software:DRBD
http://www.linux-ha.org
http://www.howtoforge.com/high_availability_heartbeat_centos
http://mon.wiki.kernel.org/index.php/Main_Page
http://kjalleda.googlepages.com/mysqlfailover
About this entry

You’re currently reading “High-Availability with Fedora, DRBD, Heartbeat and Mon [and Xen]”, an entry on sudo make me a sandwich

Published:
04.17.08 / 6pm

Category:
Linux, SQL

Tags:
Availability, DRBD, Failover, Fedora, Heartbeat, High availability, Linux, Mon, MYSQL, Redhat

Post Navigation:
« Ordering individual results in a UNION statement, or “Oh UNION/ORDER BY, how do I hate thee? Let me count the ways.”
Automating document conversion in Linux using JODConverter/OOo »

This is url for DRBD Configuration

http://little.bluethings.net/2008/04/17/high-availability-with-fedora-drbd-heartbeat-and-mon/

No comments:

Post a Comment