« Bertucci's Sucks | Main | check_megaraid_sas Nagios plugin »

May 10, 2007

SLES10 and megaraid_sas dkms

This is a quick write-up of the problem I had (and subsequent solution) with installing the latest version of the megaraid_sas driver (as used by the Dell PowerEdge 1950/2950/etc family for their PERC5 RAID controllers) on SLES 10.

I was bitten by some recent problems with one of my PowerEdge 2950's RAID container which was exacerbated by the fact that I had failed to apply some rather urgent firmware updates for the PERC5/i controller card. The PERC5/i is just a Dell-badged LSI SAS RAID card and thus uses the megaraid_sas drivers, but I am jumping ahead a bit here. For the sake of those who follow the same naive course as I did, may you find this quickly with Google and save yourself some grief.

Having fell foul to a failed RAID container, I quickly went out and found the latest firmware updates which I knew existed but had yet to apply to the failed system. I applied it to the system and a couple of other which I knew were also in need. Installed on a bootable flash drive, the updates went by rather quickly. Yay.

I reboot the system and all seems well, that is until I notice that syslog is running full load on one CPU. So I look at my /var/log/messages and it is chock full of errors like this, at a rate like 100-line per second:

May 10 09:30:03 doomed kernel:   status = 1, message = 00, host = 0, driver = 08
May 10 09:30:03 doomed kernel:   <6>sd: Current: sense key: Illegal Request
May 10 09:30:03 doomed kernel:     Additional sense: Invalid command operation code
May 10 09:30:03 doomed kernel: FAILED

So this is very rapidly filling up my /var partition and making me unhappy. Very unhappy.

I thought this was perhaps related to a failed rebuild of my newly initialized RAID container... so I waited for it to finish 1.43TB of RAID building, and it still spewed away. Googling the error message didn't yield a whole lot in results. Eventually I looked on the Dell site to see if their were any other updates I was missing, and lo and behold the day after they released the PERC5/i firmware update there was also a driver update.

I downloaded the driver. It requires The Dell DKMS (Dynamic Kernel Module Support), so I download that as well. Install the DKMS rpm, no problem. Install the megaraid_sas rpm, no problem their either. It installs the module, does a mkinitrd, and waits for me to reboot. I reboot and all is hunky dory. No more crazy errors. I am happy and decide it is now time to rinse and repeat with my other two servers exhibiting the same problem.

No joy. When I go to the next box and install the megaraid_sas rpm I get:

Preparing...                ########################################### [100%]
   1:megaraid_sas           ########################################### [100%]

Loading tarball for module: megaraid_sas / version: v00.00.03.09

Loading /usr/src/megaraid_sas-v00.00.03.09...
Creating /var/lib/dkms/megaraid_sas/v00.00.03.09/source symlink...

DKMS: ldtarball Completed.

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area....
make KERNELRELEASE=2.6.16.27-0.9-smp -C /lib/modules/2.6.16.27-0.9-smp/build SUBDIRS=/var/lib/dkms/megaraid_sas/v00.00.03.09/build modules....(bad exit status: 2)

Error! Bad return status for module build on kernel: 2.6.16.27-0.9-smp (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/megaraid_sas/v00.00.03.09/build/ for more information.

Error! Could not locate megaraid_sas.ko for module megaraid_sas in the DKMS tree.
You must run a dkms build for kernel 2.6.16.27-0.9-smp (x86_64) first.
error: %post(megaraid_sas-v00.00.03.09-1.noarch) scriptlet failed, exit status 4

WTF?!?

It ends up that the first box I updated was actually a pristine install of SLES10 as released. No kernel updates no nothing. So all it ended up doing, it seems, was unpacking a precompiled module from somewhere and stuffing it in the right place. Hell, I didn't even have the kernel-source rpm installed on it.

The referenced make.log looks like:

DKMS make.log for megaraid_sas-v00.00.03.09 for kernel 2.6.16.27-0.9-smp (x86_64)
Thu May 10 21:19:18 EDT 2007
make: Entering directory `/usr/src/linux-2.6.16.27-0.9-obj/x86_64/smp'
make -C ../../../linux-2.6.16.27-0.9 O=../linux-2.6.16.27-0.9-obj/x86_64/smp modules
  CC [M]  /var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.o
/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c: In function 'megasas_probe_one':
/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: 'IRQF_SHARED' undeclared (
first use in this function)
/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: (Each undeclared identifie
r is reported only once
/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.c:2629: error: for each function it appea
rs in.)
make[3]: *** [/var/lib/dkms/megaraid_sas/v00.00.03.09/build/megaraid_sas.o] Error 1
make[2]: *** [_module_/var/lib/dkms/megaraid_sas/v00.00.03.09/build] Error 2
make[1]: *** [modules] Error 2
make: *** [modules] Error 2
make: Leaving directory `/usr/src/linux-2.6.16.27-0.9-obj/x86_64/smp'

If I take a look in /var/lib/dkms/megaraid_sas/v00.00.03.09 I see that there is a patches directory, and in there a sles10-ga.patch. Part of the patch file says:

-       if (request_irq(pdev->irq, megasas_isr, IRQF_SHARED, "megasas", instance)) {
+       if (request_irq(pdev->irq, megasas_isr, SA_SHIRQ, "megasas", instance)) {

Well, that is odd, because the make log was complaining about IRQF_SHARED... so it seems the patch was not getting applied to the source at all. K-lame. Whatever detection mechanism there is for the distro you are running has obviously failed here. So a little of this:

 # patch < patches/sles10-ga.patch 
patching file megaraid_sas.c
# dkms build -m megaraid_sas -v v00.00.03.09
 
Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area....
make KERNELRELEASE=2.6.16.27-0.9-smp -C /lib/modules/2.6.16.27-0.9-smp/build SUBDIRS=/var/lib/dkms/megaraid_sas/v00.00.03.09/build modules....
cleaning build area....

DKMS: build Completed.

and presto, the module builds. Complete the task with a

# dkms install -m megaraid_sas -v v00.00.03.09
Running module version sanity check.

megaraid_sas.ko:
 - Original module
   - Found /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/scsi/megaraid//megaraid_sas.ko
   - Storing in /var/lib/dkms/megaraid_sas/original_module/2.6.16.27-0.9-smp/x86_64/
   - Archiving for uninstallation purposes
 - Installation
   - Installing to /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/scsi/megaraid//

/etc/modprobe.conf: added alias reference for 'megaraid_sas'
depmod.....

Saving old initrd as /boot/initrd-2.6.16.27-0.9-smp_old
Making new initrd as /boot/initrd-2.6.16.27-0.9-smp
(If next boot fails, revert to the _old initrd image)
mkinitrd.....

DKMS: install Completed.

and everything is good to go. Yay. Reboot. Doom averted. Go home.


A co-worker of mine came up with a subsequently much neater solution to this:

The fix is the make dkms aware of newer SLES 10 kernel versions by doing the following:
vi /usr/src/megaraid_sas-v00.00.03.09/dkms.conf

Change the following lines:

PATCH[6]="sles10-ga.patch"
PATCH_MATCH[6]="2\.6\.16\.21-0\.8"

to:

PATCH[6]="sles10-ga.patch"
PATCH_MATCH[6]="2\.6\.16\.2.-0\..*"

Posted by delgado at May 10, 2007 8:48 PM

Comments

Thank you so much for the elegant write-up. I was about going out of my mind trying to google a solution. I had a custom kernel installed, so I also had trouble building the module. Needed to get the kernel source rpm installed and ensure that the /lib/modules//build and /lib/modules//source soft links were pointed correctly to the sources. Thanks again.

Posted by: Bill at July 13, 2007 2:49 PM

Here at work, we have varying dell poweredges. PowerEdge 750, 860, 2650, SC1435..etc. They all primarily run Enterprise Linux. I like the utility "afacli", which allows manipulation of the hardware RAID controller and such. It was semi standardized until we bought new servers. Is there a general linux solution that can do the same thing or an alternative program/script to install that would work with most raid controllers or just with the Perc 5/SAS on the new servers. Thanks!

Posted by: Brandon Clinger at July 17, 2007 3:07 PM

Thanx man, it works. Saved me a lot(!) of time.
Although the purpose of Dell's dkms project is very elegant, I hope the patch will also be in the mainstream kernel and not only in a separate module which must be dkms-based (and also suffers the patch error). Perhaps it already is, but I use the SLES10 kernel for now.

Posted by: Jorden at October 16, 2007 8:24 AM

Post a comment




Remember Me?