These docs describe the basic process of going about monitoring a Dell PERC3Di controller (as found on the PowerEdge 1650) via Nagios and afacli under SuSE Linux Enterprise Server 9 (SLES9).
Just to say, Nagios is a super useful open source tool for monitoring various network services and such. You can find the full deal on it at the Nagios home.
Also, these directions would presumably work for any other system, Dell PowerEdge or not, with the
same family of Adaptec RAID controllers which use the aacraid driver and can thus be monitored via
the afacli
utility.
As always, any comments, code enhancements, etc that you might have are always appreciated.
So, I've got a rack full of Dell PowerEdge servers... mostly 1650s and 1750s. They have nifty RAID controllers, but we hadn't really been monitoring them actively, mainly the occasion check of the status lights on the systems. Not much point in having a RAID if you don't know when it stops having redundancy.
Now, with Dell it would seem that if I ran Red Hat in their preferred releases, I would be able to use some of the canned Dell management systems for Linux. One problem (of many) is that I am lazy and I didn't want to go through the whole hassle of trying to get the Dell management solution running under SLES. The other problem is that I just don't trust running the Dell stuff, besides, I already have Nagios installed and it rocks.
The basic way that things work is like so:
Getting all of this to work requires three basic parts:
afacli
.afacli
and the centralized
monitoring, check_afacli
.
afacli
is the command line interface (thus the cli in afacli) for the Adaptec RAID
controller which Dell uses as their PERC 3Di. There are links to some RPMs for it from
Dell's Linux RAID page. The most recent version
listed on that page (at the time of writing this) has the afaapps-2.7
RPM as part of it. 2.7 works
fine, but whoever built it is a real tool and managed to leave some dependencies audio libs (WTF???)
in the package. So, if you use that, you actually need to install the arts
RPM.
Otherwise you want to find afaapps-2.8
which is less broken. I found that with the
that comes with 2.8 that I really needed to run sh MAKEDEV.afa afa0 with the
MAKEDEV.afa
provided in the RPM to make the appropriate device. This was not an issue
with 2.7.
I can't and won't go into the details of setting up Nagios monitoring, please refer to the Nagios home for that. For the purposes of this doc, I am that you are remotely monitoring the RAID. If it is a local RAID, then you can obviously cut out many steps.
There isn't much Nagios-wise that needs to be installed on the system to be monitored. Basically,
you need to install all of the glue to enable the remote execution and results gathering from the
Nagios plugin. SLES9 comes with a nagios-plugins-1.3.1
RPM which I installed. This gives
me some Perl libs that my plugin depends upon and other plugins that I would want to use anyways.
Because I am checking the state of the RAID remotely, I need to setup a daemon on the system to answer
the requests to check on the RAID. The tool used for this is nrpe
(Nagios Remote Plugin
Executor), which can be downloaded from the Nagios:
Extras and Addons page. This is pretty trivial to build and install. Be sure to create an
unprivileged nagios user and group for nrpe to run under as a daemon.
nrpe
needs to be configured with info on which commands it will accept and what it
does when they are called, so my nrpe.cfg
has the following line in it to call my
plugin:
command[check_afacli]=/usr/lib/nagios/plugins/check_afacli
The Nagios server has to know how to call check_afacli
on the remote system, so my
checkcommands.cfg
has an entry like:
define command{ command_name check_afacli command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_afacli -t 30 }
There isn't a whole lot to say about my script, check_afacli
. It is
written in my unpolished Perl. I would like to think that the code is sound, but my regex may be
ugly. You are warned.
If you decide to adopt it for your own use, you will need to customize any paths to required files as needed, of course.
Also, the script is being executed as the nagios user by the nrpe daemon, to
you need to be sure that the nagios user has permission to run afacli. I did this by enabling the
nagios user to sudo afacli without requiring a password. So my sudoers
file has a line
like this:
nagios ALL=(ALL) NOPASSWD: /sbin/afacli
The plugin redirects a set of commands to afacli
from a file called
afascript
which looks like:
logfile start '/tmp/afacli.log' open afa0 controller details container list /all /full enclosure show slot close logfile end exit
Yes, the spaces do seem to be required in there, that isn't just indenting for the sake of it. You
could also add more commands to be passed to afacli
, but check_afacli
won't do much with them. What it does do is:
Some things I need to work on with this:
afacli
to run in a non-interactive terminal session. I
could run check_afacli
on the local system just fine from the command line, but as
soon as nrpe
tried to run it it would abort before any commands could get executed
by afacli
. Likewise, I couldn't run afacli
in any cron jobs etc. So
now that I have got things working with 9 I need look at what may have changed and made this
difference. I am suspecting that it is some sort of libncurses
SNAFU.afacli
are rather poor, so much of this is going
to have to be through experimentation.techno-obscura : delgado : notes