Managing your Dell or LSI logic RAID card on vmware ESXi 5.x

Many of you have likely had the same issue as I ran into, you have a remote server  in a remote datacenter with only DRAC6 access and no way to access the RAID card without a reboot. Well, those days are gone! I have discovered  the path to get into the RAID card and do pretty much any function you could do from the BIOS gui and more.

First,  you need to get the software installed, there’s this awesome article I found for that.

http://de.community.dell.com/techcenter/support-services/w/wiki/909.how-to-install-megacli-on-esxi-5-x

This includes the commands to view the log which is important… but doesn’t help you do much.

View system logs:

Switch with cd /opt/lsi/MegaCLI/ to the MegaCLI Folder and use this command to get your controllerlog:

./MegaCli -FwTermLog -Dsply –aALL

To Merge it in A .txt File use this:

./MegaCli -FwTermLog -Dsply –aALL > lsi.txt

The Logfile is now located in /opt/lsi/MegaCLI/

Here’s the  list of other commands I have compiled from  all over the net:

View information about the RAID adapter

For checking the firmware version, battery back-up unit presence, installed cache memory and the capabilities of the adapter:

# MegaCli -AdpAllInfo -aAll

View information about the battery backup-up unit state

# MegaCli -AdpBbuCmd -aAll

View information about virtual disks

Useful for checking RAID level, stripe size, cache policy and RAID state:

# MegaCli -LDInfo -Lall -aALL

View information about physical drives

# MegaCli -PDList -aALL

Patrol read

Patrol read is a feature which tries to discover disk error before it is too late and data is lost. By default it is done automatically (with a delay of 168 hours between different patrol reads) and will take up to 30% of IO resources.

To see information about the patrol read state and the delay between patrol read runs:
# MegaCli -AdpPR -Info -aALL

To find out the current patrol read rate, execute
# MegaCli -AdpGetProp PatrolReadRate -aALL

To reduce patrol read resource usage to 2% in order to minimize the performance impact:
# MegaCli -AdpSetProp PatrolReadRate 2 -aALL

To disable automatic patrol read:
# MegaCli -AdpPR -Dsbl -aALL

To start a manual patrol read scan:
# MegaCli -AdpPR -Start -aALL

To stop a patrol read scan:
# MegaCli -AdpPR -Stop -aALL

You could use the above commands to run patrol read in off-peak times.

Migrate from one RAID level to another

In this example, I migrate the virtual disk 0 from RAID level 6 to RAID 5, so that the disk space of one additional disk becomes available. The second command is used to make Linux detect the new size of the RAID disk.

# /usr/local/sbin/MegaCli -LDRecon -Start -r5 -L0 -a0
# echo 1 > /sys/block/sda/device/rescan

Extending an existing RAID array with a new disk

./MegaCli -LDRecon -Start -r5 -Add -PhysDrv[32:3] -L0 -a0

Create a new RAID 5 virtual disk from a set of new hard drives

First we need to now the enclosure and slot number of the hard drives we want to use for the new RAID disk. You can find them out by the first command. Then I add a virtual disk using RAID level 5, followed by the list of drives I want to use, specified by enclosure:slot syntax.

# MegaCli -PDList -aALL | egrep ‘Adapter|Enclosure|Slot|Inquiry’
# MegaCli -CfgLdAdd -r5′[252:5,252:6,252:7]’ -a0

Extending an existing RAID array with a new disk

First check the enclosure device ID and the slot number of the newly added disk with the command above. Then we reconstruct the logical drive, adding the new drive. For a RAID 5 array this command is used:

# MegaCli -LDRecon -Start -r5 -Add -PhysDrv[32:3] -L0 -a0

View reconstruction progress

When reconstructing a RAID array, you can check its progress with this command.
# MegaCli -LDRecon ShowProg L0 -a0

(replace L0 by L1 for the second virtual disk, and so on)

Configure write-cache to be disabled when battery is broken

# MegaCli -LDSetProp NoCachedBadBBU -LALL -aALL

Change physical disk cache policy

If your system is not connected to a UPS, you should disable the physical disk cache in order to prevent data loss.

# MegaCli -LDGetProp -DskCache -LAll -aALL

To enable it (only do this if you have a UPS and redundant power supplies):

# MegaCli -LDGetProp -DskCache -LAll -aALL

 

General Parameters

  • Adapter parameter -aN

The parameter -aN (where N is a number starting with zero or the string ALL) specifies the adapter ID. If you have only one controller it’s safe to use ALL instead of a specific ID, but you’re encouraged to use the ID for everything that makes changes to your RAID configuration.

  • Physical drive parameter      -PhysDrv [E:S]

For commands that operate on one or more pysical drives, the -PhysDrv [E:S] parameter is used, where E is the enclosure device ID in which the drive resides and S the slot number (starting with zero). You can get the enclosure device ID using MegaCli -EncInfo -aALL. The E:S syntax is also used for specifying the physical drives when creating a new RAID virtual drive (see 5).

  • Virtual drive parameter -Lx

The parameter -Lx is used for specifying the virtual drive (where x is a number starting with zero or the string all).

Running the executable can be accomplished by:

shell> /opt/MegaRAID/MegaCli/MegaCli <cmd>

or

shell> cd /opt/MegaRAID/MegaCli
shell> ./MegaCli <cmd>

Gather information

  • Controller information
     MegaCli -AdpAllInfo -aALL
     MegaCli -CfgDsply -aALL
     MegaCli -adpeventlog -getevents -f lsi-events.log -a0 -nolog
  • Enclosure information
     MegaCli -EncInfo -aALL
  • Virtual drive information
     MegaCli -LDInfo -Lall -aALL
  • Physical drive information
     MegaCli -PDList -aALL
     MegaCli -PDInfo -PhysDrv [E:S] -aALL
  • Battery backup information      (Cisco MSPs do not have the battery backup unit installed, but in case yours has one)
     MegaCli -AdpBbuCmd -aALL 

  • Check Battery backup warning on boot.  If this is enabled on an MSP, it will require manual intervention every time the system boots
     MegaCli -AdpGetProp BatWarnDsbl -a0 

Controller management

  • Silence active alarm
     MegaCli -AdpSetProp AlarmSilence -aALL
  • Disable alarm
     MegaCli -AdpSetProp AlarmDsbl -aALL
  • Enable alarm
     MegaCli -AdpSetProp AlarmEnbl -aALL
  • Disable battery backup warning on system boot
     MegaCli -AdpSetProp BatWarnDsbl -a0
  • Change the adapter rebuild rate to 60%:

MegaCli -AdpSetProp {RebuildRate -60} -aALL

Virtual drive management

  • Create RAID 0, 1, 5 drive
     MegaCli -CfgLdAdd -r(0|1|5) [E:S, E:S, ...] -aN
  • Create RAID 10 drive
     MegaCli -CfgSpanAdd -r10 -Array0[E:S,E:S] -Array1[E:S,E:S] -aN
  • Remove drive
     MegaCli -CfgLdDel -Lx -aN

Physical drive management

  • Set state to offline
     MegaCli -PDOffline -PhysDrv [E:S] -aN
  • Set state to online
     MegaCli -PDOnline -PhysDrv [E:S] -aN
  • Mark as missing
     MegaCli -PDMarkMissing -PhysDrv [E:S] -aN
  • Prepare for removal
     MegaCli -PdPrpRmv -PhysDrv [E:S] -aN
  • Replace missing drive
     MegaCli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN

The number N of the array parameter is the Span Reference you get using MegaCli -CfgDsply -aALL and the number N of the row parameter is the Physical Disk in that span or array starting with zero (it’s not the physical disk’s slot!).

  • Rebuild drive – Drive status should be “Firmware state: Rebuild”
     MegaCli -PDRbld -Start -PhysDrv [E:S] -aN
     MegaCli -PDRbld -Stop -PhysDrv [E:S] -aN
     MegaCli -PDRbld -ShowProg -PhysDrv [E:S] -aN     
     MegaCli -PDRbld -ProgDsply -physdrv [E:S] -aN
  • Clear drive
     MegaCli -PDClear -Start -PhysDrv [E:S] -aN
     MegaCli -PDClear -Stop -PhysDrv [E:S] -aN
     MegaCli -PDClear -ShowProg -PhysDrv [E:S] -aN
  • Bad to good
     MegaCli -PDMakeGood -PhysDrv[E:S] -aN

Changes drive in state Unconfigured-Bad to Unconfigured-Good.

Hot spare management

  • Set global hot spare
     MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

  • Remove hot spare
     MegaCli -PDHSP -Rmv -PhysDrv [E:S] -aN

  • Set dedicated hot spare
     MegaCli -PDHSP -Set -Dedicated -ArrayN,M,... -PhysDrv [E:S] -aN

Walkthrough: Rebuild a Drive that is marked ‘Foreign’ when Inserted:


  • Bad to good
    MegaCli -PDMakeGood -PhysDrv [E:S]  -aALL
  • Clear the foreign setting
     MegaCli -CfgForeign -Clear -aALL

  • Set global hot spare
     MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

Walkthrough: Change/replace a drive

1. Set the drive offline, if it is not already offline due to an error

     MegaCli -PDOffline -PhysDrv [E:S] -aN

2. Mark the drive as missing

     MegaCli -PDMarkMissing -PhysDrv [E:S] -aN

3. Prepare drive for removal

     MegaCli -PDPrpRmv -PhysDrv [E:S] -aN

4. Change/replace the drive

5. If you’re using hot spares then the replaced drive should become your new hot spare drive

     MegaCli -PDHSP -Set -PhysDrv [E:S] -aN

6. In case you’re not working with hot spares, you must re-add the new drive to your RAID virtual drive and start the rebuilding

     MegaCli -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN
     MegaCli -PDRbld -Start -PhysDrv [E:S] -aN

Gathering Standard logs

On every instance of a hard drive problem with an MSP server, we need to run the following commands to have any information about the problem:

   shell> rm –f MegaSAS.log
   shell> /opt/MegaRAID/MegaCli/MegaCli -adpallinfo -a0
   shell> /opt/MegaRAID/MegaCli/MegaCli -encinfo -a0
   shell> /opt/MegaRAID/MegaCli/MegaCli -ldinfo -lall -a0
   shell> /opt/MegaRAID/MegaCli/MegaCli -pdlist -a0
   shell> /opt/MegaRAID/MegaCli/MegaCli -adpeventlog -getevents -f lsi-events.log -a0 -nolog
   shell> /opt/MegaRAID/MegaCli/MegaCli -fwtermlog -dsply -a0 -nolog > lsi-fwterm.log

Collect the MegaSAS.log, lsi-events.log, and the lsi-fwterm.log files from the directory where the commands are run (they can be run from any directory on the MSP server) and attach the logs to the service request. You may use a program such as WinSCP (freeware) to pull the files off of the server.

 

I hope this info is useful to you all in one place, it took me a day to figure it all  out.

 

Mike