BLUE21neo: 3月 2016

2016年3月27日日曜日

[CentOS6] Graphite + collectd のインストール

CentOS6.7 に Graphite と Collectd をインストールしてリソース情報をグラフ化してみます。
Graphite と Collectd の詳細については、下記の公式ページを参照してください。

１．epel レポジトリのインストール

GraphiteとCollectd は eple レポジトリから yum でインストールします。
なので、まず、epel レポジトリを使えるようにします。

[root@node01 ~]# yum install -y epel-release

２．Graphite のインストール

GraphiteとApacheをインストールします。

[root@node01 ~]# yum -y install graphite-web python-carbon httpd

Apacheのデフォルト設定だと起動時にワーニングがでるので、設定変更します。

[root@node01 ~]# cp /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.org
[root@node01 ~]# vi /etc/httpd/conf/httpd.conf
[root@node01 ~]# diff /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.org
276c276
< ServerName 127.0.0.1:80
---
> #ServerName www.example.com:80

Apacheを自動起動するように設定して、サービスを開始します。

[root@node01 ~]# chkconfig httpd on
[root@node01 ~]# service httpd start
httpd を起動中:                                            [  OK  ]

Graphiteの設定を変更します。

[root@node01 ~]# cp /etc/graphite-web/local_settings.py /etc/graphite-web/local_settings.py.org
[root@node01 ~]# vi /etc/graphite-web/local_settings.py
[root@node01 ~]# diff /etc/graphite-web/local_settings.py /etc/graphite-web/local_settings.py.org
13c13
< SECRET_KEY = 'secret1234'
---
> #SECRET_KEY = 'UNSAFE_DEFAULT'
18c18
< ALLOWED_HOSTS = [ '*' ]
---
> #ALLOWED_HOSTS = [ '*' ]
23c23
< TIME_ZONE = 'Asia/Tokyo'
---
> #TIME_ZONE = 'America/Los_Angeles'

Graphiteのデータベース（SQLite）を作成します。

ここで、GraphiteのWEB画面のユーザを作成できるのですが、今回は、ここでは作らずに、後でユーザを作ります。

 [root@node01 ~]# python /usr/lib/python2.6/site-packages/graphite/manage.py syncdb
/usr/lib/python2.6/site-packages/django/conf/__init__.py:75: DeprecationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL instead.
  "use STATIC_URL instead.", DeprecationWarning)
/usr/lib/python2.6/site-packages/django/core/cache/__init__.py:82: DeprecationWarning: settings.CACHE_* is deprecated; use settings.CACHES instead.
  DeprecationWarning
Creating tables ...
Creating table account_profile
Creating table account_variable
Creating table account_view
Creating table account_window
Creating table account_mygraph
Creating table dashboard_dashboard_owners
Creating table dashboard_dashboard
Creating table events_event
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table django_session
Creating table django_admin_log
Creating table django_content_type
Creating table tagging_tag
Creating table tagging_taggeditem

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): no
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

データベースファイルに Apacheがアクセスできるように、オーナを変更します。

[root@node01 lib]# chown apache.apache /var/lib/graphite-web/graphite.db

Carbon Cache の自動起動を設定してサービスを開始します。

[root@node01 ~]# chkconfig carbon-cache on
[root@node01 ~]# service carbon-cache start
Starting carbon-cache: Starting carbon-cache (instance a)
                                                           [  OK  ]

上記の設定を反映するために Apache を再起動します。

[root@node01 ~]# service httpd restart
httpd を停止中:                                            [  OK  ]
httpd を起動中:                                            [  OK  ]

ブラウザで下記URLにアクセスしてWEB画面が表示できるか確認します。

http://<IPアドレス>/

3．Collectd のインストール

Collectdをインストールします。

[root@node01 ~]# yum install -y collectd

CollectdからGraphite にデータを送るプラグインは、Collectのバージョンによって異なり、次のとおり。

4.10 以前： carbon_writer
5.0 以上： write_graphite

epel からインストールした Collectd のバージョンは 4.10 なので、今回は carbon_writer を使います。
carbon_writer は、別途インストールが必要です。
下記のようにディレクトリを作成して、Github からダウンロードして設置します。

[root@node01 ~]# mkdir /opt/collectd-plugins
[root@node01 ~]# cd /opt/collectd-plugins
[root@node01 collectd-plugins]# curl -OL https://raw.githubusercontent.com/indygreg/collectd-carbon/master/carbon_writer.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9118  100  9118    0     0  13154      0 --:--:-- --:--:-- --:--:-- 37834

Collectd の設定を変更します。
"Hostname"に設定した名前が、Graphite のWEB画面に表示されます。
"LineReceiverHost" は、Graphite をインストールしたサーバを指定します。
"MetricPrefix"に設定した名前でグルーピングできるようになります。

[root@node01 ~]# cp /etc/collectd.conf /etc/collectd.conf.org
[root@node01 ~]# vi /etc/collectd.conf
[root@node01 collectd-plugins]# diff /etc/collectd.conf /etc/collectd.conf.org
13c13
< Hostname    "192.168.56.11"
---
> #Hostname    "localhost"
23,43d22
< # sendto Graphite
< <LoadPlugin "python">
<     Globals true
< </LoadPlugin>
< <Plugin "python">
<     # carbon_writer.py is at path /opt/collectd-plugins/carbon_writer.py
<     ModulePath "/opt/collectd-plugins/"
<
<     Import "carbon_writer"
<
<     <Module "carbon_writer">
<         LineReceiverHost "192.168.56.11"
<         LineReceiverPort 2003
<         LineReceiverProtocol "tcp"
<         MetricPrefix "collectd"
<         DifferentiateCountersOverTime true
<         LowercaseMetricNames true
<         TypesDB "/usr/share/collectd/types.db"
<     </Module>
< </Plugin>
<

Collectd を自動起動するように設定して、サービスを起動します。

[root@node01 ~]# chkconfig collectd on
[root@node01 ~]# service collectd start
collectd を起動中: Initializing carbon_writer client in TCP socket mode.
                                                           [  OK  ]

Graphiteには、下図のように表示されます。
"collectd" 配下に表示されているのが、上記で Collectd をインストールしたサーバです。

見たいメトリックを選んで "value"をクリックすると、右のグラフに追加表示されます。また、グラフに表示中の "value" をクリックすると、グラフから削除されます。

複数のグラフを並べて見たい場合は、画面右上の "Dashbord" をクリックします。
下図のにように自分でグラフを作って見ることができます。

４．Graphite へのログイン

上記のGraphite用データベース構築時に作成しなかったGraphite 用のユーザを、ここで作成します。

[root@node01 ~]# python /usr/lib/python2.6/site-packages/graphite/manage.py createsuperuser
/usr/lib/python2.6/site-packages/django/conf/__init__.py:75: DeprecationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL instead.
  "use STATIC_URL instead.", DeprecationWarning)
/usr/lib/python2.6/site-packages/django/core/cache/__init__.py:82: DeprecationWarning: settings.CACHE_* is deprecated; use settings.CACHES instead.
  DeprecationWarning
Username (leave blank to use 'root'): admin
E-mail address: admin@example.com
Password:
Password (again):
Superuser created successfully.

後で、パスワードを変更したい場合は、以下のコマンドを実行します。

python /usr/lib/python2.6/site-packages/graphite/manage.py changepassword

GraphiteのWEB画面を開き、ログインしてみます。
右上の [Login] クリックし、下図の画面が表示されたら上記で作成したユーザとパスワードでログインします。

下図のようにログインできれば、作成したグラフを保存できます。

2016年3月19日土曜日

[CentOS6] NFS障害でコマンドがフリーズしたらタイムアウトさせる

NFSマウントに異常がないことを確認するために、NFSクライアント側で、定期的に　ls コマンドを実行することにします。

# ls /mnt

しかし、NFS設定にもよりますが、NFSサーバがダウンすると、lsコマンドがフリーズし、NFSサーバが復旧するまでコマンドが応答しなくなり、チェックできない場合があります。

NFS設定は変更せずに、フリーズを何とかしたい場合、timeout コマンドで回避できます。

timeout コマンドを利用して ls コマンドを実行し、指定した時間内にコマンドが終了しなければ、強制終了するようにします。

以下の例では、5 秒以内に ls コマンドが終了しなければ、強制終了します。

タイムアウトした場合のコマンドの終了ステータスは、124 になります。

# timeout 5 ls /mnt

これなら、NFSサーバダウン時の５秒後には、異常を検知できます。

2016年3月13日日曜日

[fluentd] データを集計したり、CloudWatchをZabbixに食わせたりできるらしい

fluend の使い方で、気になったものをメモ。
graphite も面白そう。

◇　集計する

http://qiita.com/moaikids/items/f9e41cebcd97a2d76579

◇　AWSの CloudWatch を Zabbix に食わせる

http://qiita.com/toshihirock/items/84954d927bb0e1eae470

◇　AWSの CloudWatch を Graphite に食わせる

https://blog.cloudpack.jp/2014/06/13/fluentd-cloudwatch-metrics-graphite/

◇ AWSの Kinesis にデータを食わせる

http://qiita.com/imaifactory/items/a21d80f2bc017af0dabe

2016年3月12日土曜日

[CentOS6][fluentd] filter_grep で日本語を使いたい

fluentd の filter_grep を試してみました。
テスト環境は以下のとおり。

[root@node01 ~]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@node01 ~]# cat /etc/sysconfig/i18n
LANG="ja_JP.UTF-8"

試した fluentd の設定は以下のとおり。filter_grep に日本語を使用してみました。
"致命的" という文字列を含む行だけアウトプットするようにしています。

## File input
<source>
  type tail
  format none
  path /tmp/dummy.log
  tag local.dummy
</source>

## Filter
<filter local.dummy>
  type grep
  regexp1 message 致命的
</filter>

## File output (/var/log/td-agent/td-agent.log)
<match local.**>
  type stdout
</match>

しかし、期待に反して、grep のキーワードに日本語を使用するとうまくいきません。
/var/log/td-agent/td-agent.log に以下のようなエラーがでます。

2016-03-12 12:43:59 +0900 [warn]: failed to grep events error_class=Encoding::CompatibilityError error="incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)"

これは、異なる文字コードの文字列で正規表現処理をしたことが原因のようです。
in_tail が使用する文字コードは、"ASCII-8BIT" ですが、grepのキーワードは "UTF-8" です。
in_tail と filter_grep の相性が悪いみたい。

しかし、どうしても、grep のキーワードに日本語を使用したかったので、 filter_grep を修正してみました。

修正内容

オリジナルの filter_grep を別名でコピーして修正し、新しいプラグインを作成する。
新しいプラグイン名は ngrep とする。
正規表現処理時の文字コードをUTF-８にする。

修正手順

オリジナルの filter_grep をコピーします。コピー先のファイル名は filetr_ngrep.rb とします。
/etc/td-agent/plugin ディレクトリが存在しない場合は、作成しておきます。

[root@node01 ~]# cp /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/filter_grep.rb /etc/td-agent/plugin/filter_ngrep.rb

filter_ngrep.rb を修正してプラグイン名を変更します。

module Fluent
  class NGrepFilter < Filter
    Fluent::Plugin.register_filter('ngrep', self)

filter_ngrep.rb を修正して in_tail で入力したデータを UTF-8 に変換します。

    def filter(tag, time, record)
      result = nil
      begin
        catch(:break_loop) do
          @regexps.each do |key, regexp|
            throw :break_loop unless ::Fluent::StringUtil.match_regexp(regexp, record[key].to_s.force_encoding('UTF-8'))
          end
          @excludes.each do |key, exclude|
            throw :break_loop if ::Fluent::StringUtil.match_regexp(exclude, record[key].to_s.force_encoding('UTF-8'))
          end

filter_ngrep の動作確認

/etc/td-agent/td-agent.conf の設定は以下のとおり。
type で ngrep を指定します。
他の使い方は、filter_grep と同じです。

<filter local.dummy>
  type ngrep
  regexp1 message 致命的
</filter>

設定変更後、以下のようにして動作を確認しました。
/var/log/td-agent/td-agent.log に "致命的" を含む行だけ出力されたので、うまくいったようです。

[root@node01 ~]# service td-agent restart
Restarting td-agent:                                       [  OK  ]
[root@node01 ~]# echo "xxxxxxx" >> /tmp/dummy.log
[root@node01 ~]# echo "xxxxxxx" >> /tmp/dummy.log
[root@node01 ~]# echo "xxxx致命的xxxx" >> /tmp/dummy.log
[root@node01 ~]# echo "xxxx致命的xxxx" >> /tmp/dummy.log
[root@node01 ~]# echo "xxxxxxx" >> /tmp/dummy.log
[root@node01 ~]# echo "xxxxxxx" >> /tmp/dummy.log
[root@node01 ~]# tail /var/log/td-agent/td-agent.log
    type ngrep
    regexp1 message 致命的
  </filter>
  <match local.**>
    type stdout
  </match>
</ROOT>
2016-03-12 12:54:15 +0900 [info]: following tail of /tmp/dummy.log
2016-03-12 12:54:29 +0900 local.dummy: {"message":"xxxx致命的xxxx"}
2016-03-12 12:54:31 +0900 local.dummy: {"message":"xxxx致命的xxxx"}
[root@node01 plugin]#

2016年3月6日日曜日

[SOS JobScheduler] ジョブ定義のサンプルなどダウンロード

「技術ノート」に掲載した SOS Jobscheduler の記事で作成した Job / JobChain のサンプル(XML)を
下記URLからダウンロードできます。

参考程度に見てください。

[サンプルのダウンロード]

Job / Job Chain サンプル

[注意事項]

サンプルが動くかどうかは、JobSchedulerのバージョンに依存します。
ダウンロードできるサンプルは、live フォルダを zip アーカイブしたものです。
整理していないので、過不足やゴミも含んでいます。
記事とサンプルの紐つけ資料は作っていません。

2016年3月5日土曜日

[CentOS6] Pacemaker(Corosync) のインストール

Redhat のマニュアルを参考にして、CentOS6.7 に Pacemaker( Corosync )をインストールします。

参考にした Redhat のマニュアルは以下のとおり。

Red Hat Enterprise Linux 6 - Pacemaker を使用した Red Hat High Availability Add-On の設定

Redhat6 付属の Pacemaker(Corosync)は、CMAN によって管理されるようになっているので、Redhat7 付属の Pacemaker(Corosync) とは設定方法が少し異なります。(CentOSも同じ）
/etc/corosync/corosync.conf は使用せず、/etc/cluster/cluster.conf で管理します。
下記のページが参考になります。

30分でRHEL6 High Availability Add-Onを超絶的に理解しよう!

インストールした環境は以下のとおり。

サーバ１
- OS: CentOS6.7
- IPアドレス： 10.1.0.91
- ホスト名： pm21

サーバ２
- OS: CentOS6.7
- IPアドレス： 10.1.0.92
- ホスト名： pm22

１．Pacemakerのインストール

CentOS6.7の標準レポジトリから yum でpacemaker、cman、pcs をインストールします。
「サーバ１」と「サーバ２」にインストールします。

# yum install -y pacemaker cman pcs

pcs はクラスタ管理用のCLIツールです。
pcsを使用したクラスタ操作は、クラスタメンバのサーバであれば、どれか１サーバで実行すれば良いので、ここでは、基本的に「サーバ１」で pcs コマンドを実行するようにしています。

インストールしたら corosync が cman がない場合は起動しないよう「サーバ１」と「サーバ２」で次のコマンドを実行します。

# chkconfig corosync off

必ず cman を起動させたいので「サーバ１」と「サーバ２」で次のコマンドを実行します。

# sed -i.sed "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g" /etc/sysconfig/cman

２．OS設定

「サーバ１」と「サーバ２」でSELinuxとファイヤーウォールが無効になっていることを確認します。

# getenforce
Disabled
# chkconfig --list | grep tables
ip6tables       0:off   1:off   2:off   3:off   4:off   5:off   6:off
iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

クラスタ設定ではホスト名を使用したいので、「サーバ１」と「サーバ２」の /etc/hosts にホスト名を登録します。

10.1.0.91 pm21
10.1.0.92 pm22

「サーバ１」と「サーバ２」の双方向で疎通を確認します。

# ping -c1 pm01（または、pm02）

pcsでクラスタの設定を行うときに、ssh を使用します。「サーバ１」と「サーバ２」の双方向でsshでログインできることを確認します。下記例では root を使用していますがユーザは何でもOKです。

# ssh -l root pm01(または、pm02）

pcsでコマンドを実行するときに hacluster ユーザを使用しますが、初期状態ではパスワードが設定されていないので、パスワードを設定します。「サーバ１」と「サーバ２」で実施します。

# passwd hacluster
ユーザー hacluster のパスワードを変更。
新しいパスワード:
よくないパスワード: 辞書の単語に基づいています
新しいパスワードを再入力してください:
passwd: 全ての認証トークンが正しく更新できました。

３．クラスタ設定

pcs でクラスタの設定を行います。
まず、「サーバ１」と「サーバ２」で pcsd を起動し、自動起動を有効にします。

# service pcsd start
pcsd を起動中:                                             [  OK  ]
# chkconfig pcsd on

「サーバ１」と「サーバ２」の信頼関係を設定します。hacluster ユーザで認証できるようにし、上記２で設定したパスワードを指定します。「サーバ１」だけで実行します。

[root@pm21 ~]# pcs cluster auth pm21 pm22 -u hacluster -p p@ssw0rd
pm21: Authorized
pm22: Authorized

クラスタ名を "my_cluster20" としてクラスタをセットアップし、「サーバ１」と「サーバ２」でクラスタ・サービスを起動します。
「サーバ１」だけで実行します。

[root@pm21 ~]# pcs cluster setup --start --name my_cluster20 pm21 pm22
pm21: Updated cluster.conf...
pm22: Updated cluster.conf...
Starting cluster on nodes: pm21, pm22...
pm21: Starting Cluster...
pm22: Starting Cluster...

「サーバ１」と「サーバ２」でクラスタ・サービスが自動起動するように設定します。

[root@pm21 ~]# pcs cluster enable --all
pm21: Cluster Enabled
pm22: Cluster Enabled

「サーバ１」と「サーバ２」に作成された /etc/cluster/cluster.conf の内容は、以下のとおり。

[root@pm21 ~]# cat /etc/cluster/cluster.conf
<cluster config_version="9" name="my_cluster20">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="pm21" nodeid="1">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="pm21"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm22" nodeid="2">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="pm22"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman broadcast="no" expected_votes="1" transport="udp" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

この設定だとPacemakerのログは /var/log/messages に出力されます。
/var/log/messagesに出さないようにsyslogを無効にし、AWS用に udpユニキャストに変更したい場合は以下のようにします。

[root@pm21 cluster]# cat /etc/cluster/cluster.conf
<cluster config_version="9" name="my_cluster20">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="pm21" nodeid="1">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="pm21"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pm22" nodeid="2">
      <fence>
        <method name="pcmk-method">
          <device name="pcmk-redirect" port="pm22"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman broadcast="no" expected_votes="1" transport="udpu" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
  <logging to_syslog="no" to_logfile="yes" logfile_priority="info" debug="off"/>
</cluster>

変更後は、クラスタサービスを再起動します。

[root@pm21 cluster]# pcs cluster stop --all
pm22: Stopping Cluster (pacemaker)...
pm21: Stopping Cluster (pacemaker)...
pm22: Stopping Cluster (cman)...
pm21: Stopping Cluster (cman)...
[root@pm21 cluster]# pcs cluster start --all
pm22: Starting Cluster...
pm21: Starting Cluster...

４．クラスタ起動と状態確認

クラスタサービスのプロセスは以下のとおり

[root@pm21 cluster]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
～ 省略 ～
root      2085     1  0 21:03 ?        00:00:00 corosync -f
root      2149     1  0 21:03 ?        00:00:00 fenced
root      2164     1  0 21:03 ?        00:00:00 dlm_controld
root      2226     1  0 21:03 ?        00:00:00 gfs_controld
root      2361     1  0 21:03 ?        00:00:00 pacemakerd
189       2367  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/cib
root      2368  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/stonithd
root      2369  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/lrmd
189       2370  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/attrd
189       2371  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/pengine
root      2372  2361  0 21:03 ?        00:00:00 /usr/libexec/pacemaker/crmd

クラスタの状態を確認します。「サーバ１」で実行します。

[root@pm21 cluster]# pcs status cluster
Cluster Status:
 Last updated: Sat Mar  5 23:04:44 2016
 Last change: Sat Mar  5 21:31:24 2016
 Stack: cman
 Current DC: pm21 - partition with quorum
 Version: 1.1.11-97629de
 2 Nodes configured
 0 Resources configured

クラスタメンバ（ノード）の状態を確認します。「サーバ１」で実行します。

[root@pm21 cluster]# pcs status nodes
Pacemaker Nodes:
 Online: pm21 pm22
 Standby:
 Offline:
[root@pm21 cluster]# pcs status corosync
Nodeid     Name
   1   pm21
   2   pm22

クラスタ、ノード、リソース、デーモンの状態を確認します。「サーバ１」で実行します。

[root@pm21 cluster]# pcs status
Cluster name: my_cluster20
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sat Mar  5 23:05:23 2016
Last change: Sat Mar  5 21:31:24 2016
Stack: cman
Current DC: pm21 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
0 Resources configured


Online: [ pm21 pm22 ]

Full list of resources:

５．クラスタ設定のチェック

クラスタ設定をチェックします。｢サーバ1｣で実行します。

[root@pm21 cluster]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

STONITHの設定をしていないのでエラーになっています。STONITHは、Pacemakerがスプリットブレインを検知したときに強制的にH/Wを電源OFF/ONする機能だそうです。STONISHについては、以下のページが詳しいです。

HAクラスタをフェイルオーバ失敗から救おう！(PDF)

STONISH はデフォルトで有効になっていますが、検証用の仮想化環境では使用しないので無効にします。

[root@pm21 cluster]# pcs property set stonith-enabled=false

エラーにはなっていませんが、クォーラムの設定も変更します。クォーラムについては、以下が詳しいです。

Pacemakerを使いこなそう

今回は２台構成なので、スプリットブレインが発生しても quorum が特別な動作を行わないように無効にします。

[root@pm21 cluster]# pcs property set no-quorum-policy=ignore

パラメータを確認します。

[root@pm21 cluster]# pcs property
Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 no-quorum-policy: ignore
 stonith-enabled: false

クラスタの再起動は不要です。変更は即時反映されます。

６．クラスタ操作

pcsコマンドでクラスタを操作してみます。ここでは、「サーバ１」でpcsコマンドを実行します。

クラスタを停止する場合は、以下のようにします。

[root@pm21 cluster]# pcs cluster stop --all
pm22: Stopping Cluster (pacemaker)...
pm21: Stopping Cluster (pacemaker)...
pm22: Stopping Cluster (cman)...
pm21: Stopping Cluster (cman)...

クラスタを再起動したい場合は、stop してから start します。

[root@pm21 cluster]# pcs cluster stop --all && pcs cluster start --all

「サーバ１」をスタンバイにして、状態を確認してみます。

[root@pm21 cluster]# pcs cluster standby pm21
[root@pm21 cluster]# pcs status nodes
Pacemaker Nodes:
 Online: pm22
 Standby: pm21
 Offline:

「サーバ１」をオンラインに戻します。

[root@pm21 cluster]# pcs cluster unstandby pm21
[root@pm21 cluster]# pcs status nodes
Pacemaker Nodes:
 Online: pm21 pm22
 Standby:
 Offline:

「サーバ２」だけ停止します。

[root@pm21 cluster]# pcs cluster stop pm22
pm22: Stopping Cluster (pacemaker)...
pm22: Stopping Cluster (cman)...
[root@pm21 cluster]# pcs status nodes
Pacemaker Nodes:
 Online: pm21
 Standby:
 Offline: pm22

2016年3月2日水曜日

[aws][awspec] AWSのリソースをテストする

AWSのリソース（EC2とか）を Serverspec のようにテストできる awspec というものがあるらしい。

うれしいことに、既存のAWSリソースの情報を取得して自動的にテストコードを生成してくれるらしい。

下記のURL参照

AWSのリソース構成をServerspecのようにテストする "awspec" をつくった

登録: 投稿 (Atom)