Posts Tagged ‘nagios’

check_raid_amrstat Nagios plugin

Friday, July 30th, 2010

We recently recycled a Dell Poweredge 1750 equipped with a PERC 4/Di RAID controller into my realm and have it running FreeBSD 8. The PERC4/Di is a rebranded LSI MegaRAID controller and uses the amr driver under FreeBSD.

There is an appropriately titled “check_raid_amrstat – Dell AMR PERC4 FreeBSD” plugin already on Nagios Exchange and Monitoring Exchange, but as usual I can’t seem to be content with some other people’s code.

I cleaned up the code a bit. It is a bit more in sticking with the plugin writing guidelines. Presents a bit more info than the original in a more compact format, with output similar to my MegaRAID SAS plugin. Nothing radical at all, but I like it better and present it here in case you want to see an alternative.

Nagios Plugin Repository

Thursday, July 2nd, 2009

It seems I have missed out on some drama related to Nagios recently. I went to take a look at the listing for the Nagios plugins I have put out there on nagiosexchange.org only to discover that they have now rebranded themselves as monitoringexchange.org and are run by a group that is now dedicated to a fork of Nagios called Icinga. How confusing… and now there is a new plugin site at exchange.nagios.org.

I have no clue about which of these two sites is going to end up being the de facto source for Nagios plugins, but I have claimed what is mine on the Nagios branded one and will probably maintain things in parallel for the time being. Here are the links there for my LSI MegaRAID SAS (Dell PERC5 & PERC6) plugin, check_megaraid_sas, and my Perforce license plugin, check_p4_license.

Nagios plugin for Perforce license

Wednesday, April 8th, 2009

I feel like whatever code I write, no matter how simple or seemingly insignificant, eventually ends up becoming useful to someone else at some point. So here is a quick Nagios plugin that I whipped up today, check_p4_license.

I have a server running Perforce at work and it seems that every other year I have managed to not realize the license needed renewing until someone said to me, “Hey, is there any reason Perforce isn’t working?”

This year my spider-senses started tickling earlier on and I caught it in time, but I told myself I needed to count on more than just an odd sense of doom and foreboding in April. So I wrote a quickly Perl script that calls the Perforce p4 CLI tool and looks at your server’s license info and gripes back to you if it is going to be expiring any time soon.

I have also placed this plugin up on NagiosExchange, because I have found that it is better for me to do so earlier on than to have someone else find my stuff and upload then abandon it.

check_megaraid_sas Nagios plugin

Thursday, June 7th, 2007

This is somewhat related to my earlier posting about updating the megaraid drivers. I use Nagios at work for system monitoring and one thing that I like to check is the status of the volumes managed by the RAID controller. When I first started configuring the Nagios on my new PowerEdge 1950 and 2950 systems I found a check_perc5i over on Nagios Exchange.

Unfortunately the plugin only looked like it worked properly. It would report back correctly things like the number of volumes you had online, the number of disks, failed disks etc., but if you had a failed disk it would not actually return the proper error status. It just kept on going blindly saying OK : Bad Disks=3.

So I have written my own script to check the RAID controller status, check_megaraid_sas. It is somewhat similar to the work I did for the PERC3Di with afacli and Nagios quite a while back.

In order to use it you need to have LSI’s MegaCli utility installed and the user executing the script will need to have sudo privileges (w/o a password) to execute it. Then you will end up with output like:
OK: 0:0:RAID-1:2 drives:68GB:Optimal 1:0:RAID-5:7 drives:2792GB:Optimal Drives:10 Hotspare(s):1
or (less good)
WARNING: 0:0:RAID-1:2 drives:74GB:Optimal 0:1:RAID-5:4 drives:1396GB:Optimal Drives:6 (3 Errors)

The warning is due to the detection of “other” disk errors on the drive. I am trying to find out from Dell if I can reset this count in the controller. Otherwise if it is cumulative I will probably modify my code to take a n argument for a threshold under which to ignore non-fatal errors. The output above is basically in the form:
<status> <controller #>:<volume #>:<RAID level>:<volume drive count>:<volume size>:<volume status> ... Drives:<total drives attached to controller(s)>