« Bertucci's Sucks | Main | check_megaraid_sas Nagios plugin »
May 10, 2007
SLES10 and megaraid_sas dkms
This is a quick write-up of the problem I had (and subsequent solution) with installing the latest version of the megaraid_sas driver (as used by the Dell PowerEdge 1950/2950/etc family for their PERC5 RAID controllers) on SLES 10.
I was bitten by some recent problems with one of my PowerEdge 2950's RAID container which was exacerbated by the fact that I had failed to apply some rather urgent firmware updates for the PERC5/i controller card. The PERC5/i is just a Dell-badged LSI SAS RAID card and thus uses the megaraid_sas drivers, but I am jumping ahead a bit here. For the sake of those who follow the same naive course as I did, may you find this quickly with Google and save yourself some grief.
Having fell foul to a failed RAID container, I quickly went out and found the latest firmware updates which I knew existed but had yet to apply to the failed system. I applied it to the system and a couple of other which I knew were also in need. Installed on a bootable flash drive, the updates went by rather quickly. Yay.
I reboot the system and all seems well, that is until I notice that syslog is running full load on one CPU. So I look at my /var/log/messages and it is chock full of errors like this, at a rate like 100-line per second:
May 10 09:30:03 doomed kernel: status = 1, message = 00, host = 0, driver = 08 May 10 09:30:03 doomed kernel: <6>sd: Current: sense key: Illegal Request May 10 09:30:03 doomed kernel: Additional sense: Invalid command operation code May 10 09:30:03 doomed kernel: FAILED
So this is very rapidly filling up my /var partition and making me unhappy. Very unhappy.
I thought this was perhaps related to a failed rebuild of my newly initialized RAID container... so I waited for it to finish 1.43TB of RAID building, and it still spewed away. Googling the error message didn't yield a whole lot in results. Eventually I looked on the Dell site to see if their were any other updates I was missing, and lo and behold the day after they released the PERC5/i firmware update there was also a driver update.
I downloaded the driver. It requires The Dell DKMS (Dynamic Kernel Module Support), so I download that as well. Install the DKMS rpm, no problem. Install the megaraid_sas rpm, no problem their either. It installs the module, does a mkinitrd, and waits for me to reboot. I reboot and all is hunky dory. No more crazy errors. I am happy and decide it is now time to rinse and repeat with my other two servers exhibiting the same problem.
No joy. When I go to the next box and install the megaraid_sas rpm I get:
Preparing... ########################################### [100%] 1:megaraid_sas ########################################### [100%] Loading tarball for module: megaraid_sas / version: v00.00.03.09 Loading /usr/src/megaraid_sas-v00.00.03.09... Creating /var/lib/dkms/megaraid_sas/v00.00.03.09/source symlink... DKMS: ldtarball Completed. Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area.... make KERNELRELEASE=2.6.16.27-0.9-smp -C /lib/modules/2.6.16.27-0.9-smp/build SUBDIRS=/var/lib/dkms/megaraid_sas/v00.00.03.09/build modules....(bad exit status: 2) Error! Bad return status for module build on kernel: 2.6.16.27-0.9-smp (x86_64) Consult the make.log in the build directory /var/lib/dkms/megaraid_sas/v00.00.03.09/build/ for more information. Error! Could not locate megaraid_sas.ko for module megaraid_sas in the DKMS tree. You must run a dkms build for kernel 2.6.16.27-0.9-smp (x86_64) first. error: %post(megaraid_sas-v00.00.03.09-1.noarch) scriptlet failed, exit status 4
WTF?!?
It ends up that the first box I updated was actually a pristine install of SLES10 as released. No kernel updates no nothing. So all it ended up doing, it seems, was unpacking a precompiled module from somewhere and stuffing it in the right place. Hell, I didn't even have the kernel-source rpm installed on it.
The referenced make.log looks like:
DKMS make.log for megaraid_sas-v00.00.03.09 for kernel 2.6.16.27-0.9-smp (x86_64) Thu May 10 21:19:18 EDT 2007 make: Entering directory `/usr/src/linux-2.6.16.27-0.9-obj/x86_64/smp' make -C ../../../linux-2.6.16.27-0.9 O=../linux-2.6.16.27-0.9-obj/x86_64/smp modules CC [M] /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.o /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c: In function 'megasas_probe_one': /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: 'IRQF_SHARED' undeclared ( first use in this function) /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: (Each undeclared identifie r is reported only once /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: for each function it appea rs in.) make[3]: *** [/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.o] Error 1 make[2]: *** [_module_/var/lib/dkms/megaraid_sas/v00.00.03.09/build] Error 2 make[1]: *** [modules] Error 2 make: *** [modules] Error 2 make: Leaving directory `/usr/src/linux-2.6.16.27-0.9-obj/x86_64/smp'
If I take a look in /var/lib/dkms/megaraid_sas/v00.00.03.09 I see that there is a patches directory, and in there a sles10-ga.patch. Part of the patch file says:
- if (request_irq(pdev->irq, megasas_isr, IRQF_SHARED, "megasas", instance)) { + if (request_irq(pdev->irq, megasas_isr, SA_SHIRQ, "megasas", instance)) {
Well, that is odd, because the make log was complaining about IRQF_SHARED... so it seems the patch was not getting applied to the source at all. K-lame. Whatever detection mechanism there is for the distro you are running has obviously failed here. So a little of this:
# patch < patches/sles10-ga.patch patching file megaraid_sas.c # dkms build -m megaraid_sas -v v00.00.03.09 Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area.... make KERNELRELEASE=2.6.16.27-0.9-smp -C /lib/modules/2.6.16.27-0.9-smp/build SUBDIRS=/var/lib/dkms/megaraid_sas/v00.00.03.09/build modules.... cleaning build area.... DKMS: build Completed.
and presto, the module builds. Complete the task with a
# dkms install -m megaraid_sas -v v00.00.03.09 Running module version sanity check. megaraid_sas.ko: - Original module - Found /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/scsi/megaraid//megaraid_sas.ko - Storing in /var/lib/dkms/megaraid_sas/original_module/2.6.16.27-0.9-smp/x86_64/ - Archiving for uninstallation purposes - Installation - Installing to /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/scsi/megaraid// /etc/modprobe.conf: added alias reference for 'megaraid_sas' depmod..... Saving old initrd as /boot/initrd-2.6.16.27-0.9-smp_old Making new initrd as /boot/initrd-2.6.16.27-0.9-smp (If next boot fails, revert to the _old initrd image) mkinitrd..... DKMS: install Completed.
and everything is good to go. Yay. Reboot. Doom averted. Go home.
A co-worker of mine came up with a subsequently much neater solution to this:
The fix is the make dkms aware of newer SLES 10 kernel versions by doing the following:
vi /usr/src/megaraid_sas-v00.00.03.09/dkms.confChange the following lines:
PATCH[6]="sles10-ga.patch" PATCH_MATCH[6]="2\.6\.16\.21-0\.8"to:
PATCH[6]="sles10-ga.patch" PATCH_MATCH[6]="2\.6\.16\.2.-0\..*"
Posted by delgado at May 10, 2007 8:48 PM
Comments
Thank you so much for the elegant write-up. I was about going out of my mind trying to google a solution. I had a custom kernel installed, so I also had trouble building the module. Needed to get the kernel source rpm installed and ensure that the /lib/modules//build and /lib/modules//source soft links were pointed correctly to the sources. Thanks again.
Posted by: Bill at July 13, 2007 2:49 PM
Here at work, we have varying dell poweredges. PowerEdge 750, 860, 2650, SC1435..etc. They all primarily run Enterprise Linux. I like the utility "afacli", which allows manipulation of the hardware RAID controller and such. It was semi standardized until we bought new servers. Is there a general linux solution that can do the same thing or an alternative program/script to install that would work with most raid controllers or just with the Perc 5/SAS on the new servers. Thanks!
Posted by: Brandon Clinger at July 17, 2007 3:07 PM
Thanx man, it works. Saved me a lot(!) of time.
Although the purpose of Dell's dkms project is very elegant, I hope the patch will also be in the mainstream kernel and not only in a separate module which must be dkms-based (and also suffers the patch error). Perhaps it already is, but I use the SLES10 kernel for now.
Posted by: Jorden at October 16, 2007 8:24 AM