1. 正确的启停
1.1. 启动顺序
- master:
mfsmaster start
- 所有的chunk:
mfschunkserver start
- metalogger:
mfsmetalogger start
- client:
mfsmount ...
1.2. 关闭顺序
- client:
umount
- 所有的chunk:
mfschunkserver stop
- metalogger:
mfsmetalogger stop
- master:
mfsmaster stop
2. 副本数量
在客户端操作:
mfssetgoal -r 3 /mnt/mfs
# 设置副本 的份数, 推荐 3 份
# -r递归
mfsgetgoal /mnt/mfs
# 查看某文件
mfsdirinfo -H /mnt/mfs
# 查看目录信息
mfsfileinfo /mnt/mfs/passwd
mfscheckfile /mnt/mfs/passwd
# 创建一个65m的chunk
mkdir /mnt/mfs/4copy
mfssetgoal -r 4 /mnt/mfs/4copy
dd if=/dev/zero of=/mnt/mfs/4copy/65m.img bs=1M count=65
mfsfileinfo /mnt/mfs/4copy/65m.img
2.1. 注意事项
- 安装完后,立刻设置副本数
- 创建多级目录
- 不同目录,副本数不同
- 不要所有文件放在一个目录
- 目录迁移(一个目录不要太大)
- MFS数据是存放多个chunk
- 一个chunk大小是64M
- 超过了就会占用两个chunk
- 类似与block
3. 回收站功能
# 查看某个文件的回收时间
mfsgettrashtime /mnt/mfs/passwd
mfssettrashtime -r 1200 /mnt/mfs/4copy/
# 文件或目录都行
# 1200秒
3.1. 挂载回收站
首先,需要mfsexports.cfg中开启.的挂载
主要是3个目录:
- trash:回收站目录
- undel:把trash中的移动到undel中,就可以恢复了
- reserved:删除了,但是该文件真处于使用中(被打开了),关闭后会被删除
mkdir /mnt/mfs-trash/
mfsmount -H test.mfs -m /mnt/mfs-trash/
cd /mnt/mfs-trash/trash
find ./ -type f
# 使用find来查找文件
mv 002/00000002\|passwd undel/
# 恢复
# 注意文件名可能需要转义
ls /mnt/mfs/passwd
# 直接恢复到原来的位置
4. 使用rsync实现日志同步备份
这里直接把master上的同步到client上
4.1. 备份机(rysnc服务端)操作
这里的备份机就是client
sudo yum install -y rsync
sudo useradd -M -s /sbin/nologin rsync
sudo vim /etc/rsyncd.conf
# 内容如下:
uid = root
gid = root
use chroot = no
max connections = 200
timeout = 300
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
[backup_mfs]
comment = metadata from master
path = /data
ignore errors
read only = false
list = false
hosts allow = 192.168.137.0/24
hosts deny = 0.0.0.0/32
auth users = rsync_nobody
secrets file = /etc/rsync.password
# 然后创建密码文件
sudo vim /etc/rsync.password
# 内容如下:
rsync_nobody:nobody
# 600
chmod 600 /etc/rsync.password
sudo mkdir /data
4.2. 源数据机(rsync客户端)操作
sudo yum install -y rsync
sudo su
echo "nobody">> /etc/rsync.password
chmod 600 /etc/rsync.password
# 向备份机发送数据
rsync -avz /var/lib/mfs rsync_nobody@client.mfs::backup_mfs --password-file=/etc/rsync.password
# 这里使用的是域名,可以用IP
# 配置文件也是需要备份的
# 安装sersync
wget https://raw.githubusercontent.com/wsgzao/sersync/master/sersync2.5.4_64bit_binary_stable_final.tar.gz
tar zxf sersync2.5.4_64bit_binary_stable_final.tar.gz
mv GNU-Linux-x86/ /usr/local/sersync
cd sersync
mkdir conf bin log
mv confxml.xml conf
mv sersync2 bin/
echo 'export PATH=$PATH:/usr/local/sersync/bin' >> /etc/profile
# 修改sersync的配置文件
cp conf/confxml.xml conf/confxml.xml.backup
vim conf/confxml.xml
# sersync->localpath # 要同步的路径
# sersync->localpath->remote ip # rsync远程服务端地址,可以配置多个
# sersync->localpath->name # rsync远程服务端模块名
# sersync->rsync->auth # rsync的认证,下面timeout可选
# sersync->failLog # 同步失败后的额脚本
vim /usr/local/sersync/log/rsync_fail_log.sh
# 内容如下:
#!/bin/bash
#Purpose: Check sersync whether it is alive
SERSYNC='/usr/local/sersync/bin/sersync2'
CONF_FILE='/usr/local/sersync/conf/confxml.xml'
STATUS=$(ps aux |grep 'sersync2'|grep -v 'grep'|wc -l)
if [ $STATUS -eq 0 ];
then
$SERSYNC -d -r -o $CONF_FILE &
else
exit 0;
fi
# 启动
sersync2 -r -d -o /usr/local/sersync/conf/confxml.xml
# 加入计划任务
crontab -e
*/1 * * * * /bin/bash /usr/local/sersync/log/rsync_fail_log.sh > /dev/null 2>&1
5. 故障模拟和恢复
5.1. chunk故障
- 首先增加一个chunkserver
- 关闭一个chunkserver
- 备份数量:
mfsfileinfo /mnt/mfs/passwd
- 这时应该副本数会变少
- 备份数量:
- 再关闭一个chunkserver
- 副本数量为0
- 挂载点会自动卸载
- client无法读写文件
- 副本数量为0
- 重启后,副本数等都会恢复
5.2. master故障
- 关闭master
- client会卡死
- 重启后,客户端会自动挂载
5.3. 恢复master
- 关闭master,直接删除
/var/lib/mfs
- 安装一个 mfsmaster
- 利用同样的配置来配置这台 mfsmaster(利用备份来找回 mfsmaster.cfg)
- 找回 metadata.mfs.back 文件
- 注意是整个目录
- 可以从备份中找(由于配置了实时备份这个是最好的)
- 也可以中 metalogger 主机中找(如果启动了 metalogger 服务)
- metadata.mfs.back 放入 数据 目录, 默认是/var/lib/mfs.
- 恢复:
mfsmaster -a
- 有时需要-i 来自动处理一些错误
6. 监控的思路
- 监控进程
- client上使用一个脚本每分钟写入读取一次mfs
7. MFS+KeepAlived高可用
7.1. 规划
- 192.168.137.30:主机名:master.mfs
- mfsmaster
- mfsmetalogger
- keepalived master
- 192.168.137.31:主机名:backup.mfs
- mfsmaster(备用)
- mfsmetalog
- keepalived backup
- 192.168.137.32:主机名:chunk1.mfs
- 192.168.137.33:主机名:client.mfs
- 192.168.137.34:vip
- 用于master宕机后转移到备机上.
7.2. 通用
直接在现在的基础上做了
#master和backup
sudo yum install -y keepalived moosefs-master moosefs-cgi moosefs-cgiserv moosefs-cli moosefs-metalogger
# 由于之前都是直接使用默认的配置,所以master和metalogger的路径都是/var/lib/mfs
# 依次停止client,chunk metalogger,master
sudo vim /etc/mfs/mfsmaster.cfg
# 修改DATA_PATH为 /var/lib/mfsmaster
# 注意,要想停止,把原来的复制过去再启动,否则会丢数据
sudo cp -r /var/lib/mfs /var/lib/mfsmaster
sudo chown -R mfs:mfs /var/lib/mfsmaster
sudo vim /etc/mfs/mfsmetalogger.cfg
# 修改DATA_PATH为/var/lib/metalogger
sudo mkdir /var/lib/metalogger
sudo chown -R mfs:mfs /var/lib/metalogger
7.3. master配置
sudo vim /etc/keepalived/keepalived.conf
# 内容如下
vrrp_script check_run {
script "/home/madao/script/shell/mfs/keepalived_check_mfsmaster.sh"
interval 2
}
vrrp_sync_group VG1 {
group {
VI_1
}
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 50
priority 100
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 111111
}
track_script {
check_run
}
virtual_ipaddress {
192.168.137.34
}
}
# 监控脚本
vim /home/madao/script/shell/mfs/keepalived_check_mfsmaster.sh
#内容如下
#!/bin/bash
# 注释:keepalived2秒监控一次mfsmaster情况,一旦master挂掉,直接关闭keepalived,然后back接管vip,通过metalog恢复
MFSMASTER_HOST=192.168.137.34
MFSMASTER_PORT=9420
#CHECK_MASTER=/usr/local/nagios/libexec/check_tcp
CHECK_TIME=2
CMD_CHECK=/usr/bin/nmap
#mfsmaster is working MFS_OK is 1 , mfsmaster down MFS_OK is 0
MFS_OK=1
function check_mfsmaster (){
#$CHECK_MASTER -H $MFSMASTER_HOST -p $MFSMASTER_PORT >/dev/null 2>&1
#ret=$(CMD_CHECK $MFSMASTER_HOST -p $MFSMASTER_PORT|grep open|wc -l)
ret=$(netstat -lnp| grep 9420 |wc -l)
if [ $ret -eq 1 ] ;
then
MFS_OK=1
else
MFS_OK=0
fi
return $MFS_OK
}
while [ $CHECK_TIME -ne 0 ]
do
let CHECK_TIME=CHECK_TIME-1
#sleep 1
check_mfsmaster
if [ $MFS_OK -eq 1 ] ;
then
CHECK_TIME=0
exit 0
fi
if [ $MFS_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
then
systemctl stop keepalived
exit 1
fi
done
sudo systemctl start keepalived
sudo mfsmaster start
sudo mfsmetalogger start
sudo mfscgiserv start
7.4. slave(backup)配置
sudo vim /etc/keepalived/keepalived.conf
# 内容如下
vrrp_sync_group VG1 {
group {
VI_1
}
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 50
priority 99
advert_int 1
authentication {
auth_type PASS
auth_pass 111111
}
virtual_ipaddress {
192.168.137.34
}
notify_backup /home/madao/script/shell/mfs/backup.sh
notify_master /home/madao/script/shell/mfs/master.sh
# 转化位master
}
# 监控脚本如下
vim /home/madao/script/shell/mfs/master.sh
# 内容如下
#!/bin/bash
# 注释: 当主keepalived宕机后,接管vip,删除metalog上的mfsmaster数据,然后使用metalog的数据保存到本机mfsmaster,并通过mfsmaster -a恢复,然后启动mfsmaster和mfascgi
# 注释2: 当主keepalived启动后,10s后,原来keepalived的master启动并结果vip后,metalog关闭本机的mfsmaster
MFSMASTER=/sbin/mfsmaster
MFS_CGIS=/sbin/mfscgiserv
#MFSMETARESTORE=/sbin/mfsmetarestore
BACK_DIR=/data/mfs_backup
MFSMASTER_DATA_PATH=/var/lib/mfsmaster
MFSMETALOGGER_DATA_PATH=/var/lib/mfsmetalogger
TIME=$(date +%F_%H-%M-%S)
function backup2master(){
#$MFSMETARESTORE -m ${MFSMASTER_DATA_PATH}/metadata.mfs.bak -o ${MFSMASTER_DATA_PATH}/metadata.mfs $MFSMASTER_DATA_PATH/changelog_ml*.mfs
cd ${MFSMETALOGGER_DATA_PATH}
tar zcvf mfs.log.${TIME}.tar.gz ./*
[ ! -d ${BACK_DIR} ] && mkdir -p ${BACK_DIR} && /bin/mv mfs.log.*.tar.gz ${BACK_DIR}
cd ${MFSMASTER_DATA_PATH} && rm -rf changelog_ml* metadata* Master_change.OK
cp ${MFSMETALOGGER_DATA_PATH}/* ${MFSMASTER_DATA_PATH}
#touch ${MFSMASTER_DATA_PATH}/metadata.mfs
#touch ${MFSMASTER_DATA_PATH}/metadata.back
echo $TIME >${MFSMASTER_DATA_PATH}/Master_change.OK
chown -R mfs.mfs ${MFSMASTER_DATA_PATH}
$MFSMASTER -a
#$MFSMETARESTORE -a
#$MFSMASTER stop
#$MFS_CGIS stop
sleep 2
#$MFSMASTER start
#$MFS_CGIS start
}
backup2master
chmod a+x /home/madao/script/shell/mfs/master.sh
vim /home/madao/script/shell/mfs/backup.sh
# 内容如下
#!/bin/bash
MFSMASTER=/sbin/mfsmaster
function master2backup(){
$MFSMASTER stop
}
master2backup
chmod a+x /home/madao/script/shell/mfs/backup.sh
sudo systemctl start keepalived
sudo mfsmetalogger start
7.5. 模拟故障和恢复
- 主关闭mfsmaster
- 脚本关闭keepalived
- 从接管vip
- 脚本启动mfsmaster
- 测试:
- 主启动mfsmaster
- 由于主现在的keepalived是关闭的,所以没有vip
- 注意:
- 现在起作用的是从mfsmaster
- 主上的metalogger仅仅是在同步日志
- 这时一旦启动主的keepalived,可能会出现文件丢失
正确的做法,主上,同步metalogger,然后立刻启动keepalived
vim /home/madao/script/shell/mfs/start_master_mfsmaster.sh
# 内容如下
#!/bin/bash
MFSMARSTER=/sbin/mfsmaster
#MFSMETARESTORE=/sbin/mfsmetarestore
MFSMASTER_DATA_PATH=/var/lib/mfsmaster
MFSMETALOGGER_DATA_PATH=/var/lib/mfsmetalogger
MFS_CGIS=/sbin/mfscgiserv
TIME=$(date +%F_%H-%M-%S)
BACK_DIR=/data/mfs_backup
function MfsMasterStart()
{
cd ${MFSMETALOGGER_DATA_PATH}
tar -zcvf mfsmasterlog.${TIME}.tar.gz ./*
[ ! -d ${BACK_DIR} ] && mkdir -p ${BACK_DIR} && /bin/mv *.tar.gz ${BACK_DIR}
cd ${MFSMASTER_DATA_PATH}
rm -rf changelog_ml* metadata* Master_change.OK
cp ${MFSMETALOGGER_DATA_PATH}/* ${MFSMASTER_DATA_PATH}
chown -R mfs.mfs ${MFSMASTER_DATA_PATH}
$MFSMARSTER -a
#$MFSMARSTER stop
#$MFS_CGIS stop
sleep 2
#$MFS_CGIS start
#$MFSMARSTER start
echo $TIME >/var/lib/mfs/Master_change.OK
systemctl start keepalived
}
MfsMasterStart
8. 注意
- keepalived的notify有两种模式,最简单的就是一种情况一个脚本
- keepalived可能有bug,第一次安装之后,无论如何都不执行,即使指定使用root用户,重启之后好了,有点莫名其妙