NAME
cciss_vol_status - show status of logical drives attached to HP
Smartarray controllers
SYNOPSIS
cciss_vol_status [OPTION] [DEVICE]...
DESCRIPTION
Shows the status of logical drives configured on HP Smartarray
controllers.
OPTIONS
-p, --persnickety
Without this option, device nodes which can’t be opened, or
which are not found to be of the correct device type are
silently ignored. This lets you use wildcards, e.g.:
cciss_vol_status /dev/sg* /dev/cciss/c*d0, and the program will
not complain as long as all devices which are found to be of the
correct type are found to be ok. However, you may wish to
explicitly list the devices you expect to be there, and be
notified if they are not there (e.g. perhaps a PCI slot has
died, and the system has rebooted, so that what was once
/dev/cciss/c1d0 is no longer there at all). This option will
cause the program to complain about any device node listed which
does not appear to be the right device type, or is not openable.
-C, --copyright
If stderr is a terminal, Print out a copyright message, and
exit.
-q, --quiet
This option doesn’t do anything. Previously, without this
option and if stderr is a terminal, a copyright message precedes
the normal program output. Now, the copyright message is only
printed via the -C option.
-u, --try-unknown-devices
If a device has an unrecognized board ID, normally the program
will not attempt to communicate with it. In case you have some
Smart Array controller which is newer than this program, the
program may not recognize it. This option permits the program
to attempt to interrogate the board even if it is unrecognized
on the assumption that it is in fact a Smart Array of some kind.
-v, --version
Print the version number and exit.
-x, --exhaustive
Deprecated. Previously, it "exhaustively" searched for logical
drives, as, under some circumstances some logical drives might
otherwise be missed. This option no longer does anything, as
the algorithm for finding logical drives was changed to obviate
the need for it.
DEVICE
The DEVICE argument indicates which RAID controller is to be queried.
Note, that it indicates which RAID controller, not which logical drive.
For the cciss driver, the "d0" nodes matching "/dev/cciss/c*d0" are the
nodes which correspond to the RAID controllers. (See note 1, below.)
It is not necessary to invoke cciss_vol_status on each logical drive
individually, though if you do this, each time it will report the
status of ALL logical drives on the controller.
For the hpsa driver, or for fibre attached MSA1000 family devices, or
for the hpahcisr sotware RAID driver which emulates Smart Arrays, the
RAID controller is accessed via the scsi generic driver, and the device
nodes will match "/dev/sg*" Some variants of the "lsscsi" tool will
easily identify which device node corresponds to the RAID controller.
Some variants may only report the SCSI nexus (controller/bus/target/lun
tuple.) Some distros may not have the lsscsi tool.
Executing the following query to the /sys filesystem and correlating
this with the contents of /proc/scsi/scsi or output of lsscsi can help
in finding the right /dev/sg node to use with cciss_vol_status:
wumpus:/home/scameron # ls -l /sys/class/scsi_generic/*
lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg0 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:03.0/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg1 -> ../../devices/pci0000:00/0000:00:1f.1/host2/target2:0:0/2:0:0:0/scsi_generic/sg1
lrwxrwxrwx 1 root root 0 2009-11-19 07:47 /sys/class/scsi_generic/sg2 -> ../../devices/pci0000:00/0000:00:05.0/0000:0e:00.0/host4/target4:3:0/4:3:0:0/scsi_generic/sg2
wumpus:/home/scameron # cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: COMPAQ Model: BD03685A24 Rev: HPB6
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: SAMSUNG Model: CD-ROM SC-148A Rev: B408
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi4 Channel: 03 Id: 00 Lun: 00
Vendor: HP Model: P800 Rev: 6.82
Type: RAID ANSI SCSI revision: 00
wumpus:/home/scameron # lsscsi
[0:0:0:0] disk COMPAQ BD03685A24 HPB6 /dev/sda
[2:0:0:0] cd/dvd SAMSUNG CD-ROM SC-148A B408 /dev/sr0
[4:3:0:0] storage HP P800 6.82 -
From the above you can see that /dev/sg2 corresponds to SCSI nexus
4:3:0:0, which corresponds to the HP P800 RAID controller listed in
/proc/scsi/scsi.
EXAMPLE
[root@somehost]# cciss_vol_status -q /dev/cciss/c*d0
/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 1 status: OK.
/dev/cciss/c0d0: (Smart Array P800) RAID 1 Volume 2 status: OK.
/dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
/dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
/dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 1E status: Power Supply Unit failed
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 0 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 1 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 2 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 3 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 6 status: OK.
/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 7 status: OK.
[root@someotherhost]# cciss_vol_status -q /dev/sg0 /dev/cciss/c*d0
/dev/sg0: (MSA1000) RAID 1 Volume 0 status: OK. At least one spare drive.
/dev/sg0: (MSA1000) RAID 5 Volume 1 status: OK.
/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
DIAGNOSTICS
Normally, a logical drive in good working order should report a status
of "OK." Possible status values are:
"OK." (0) - The logical drive is in good working order.
"FAILED." (1) - The logical drive has failed, and no i/o to it is
poosible.
"Using interim recovery mode." (3) - One or more drives has failed,
but not so many that the logical drive can no longer operate.
The failed drives should be replaced as soon as possible.
"Ready for recovery operation." (4) - Failed drive(s) have been
replaced, and the controller is about to begin rebuilding
redundant parity data.
"Currently recovering." (5) - Failed drive(s) have been replaced,
and the controller is currently rebuilding redundant parity
information.
"Wrong physical drive was replaced." (6) - A drive has failed, and
another (working) drive was replaced.
"A physical drive is not properly connected." (7) - There is some
cabling or backplane problem in the drive enclosure.
(From fwspecwww.doc, see cpqarray project on sourceforge.net):
Note: If the unit_status value is 6 (Wrong physical drive was
replaced) or 7 (A physical drive is not properly connected), the
unit_status of all other configured logical drives will be
marked as 1 (Logical drive failed). This is to force the user to
correct the problem and to insure that once the problem is
corrected, the data will not have been corrupted by any user
action.
"Hardware is overheating." (8) - Hardware is too hot.
"Hardware was overheated." (9) - At some point in the past,
the hardware got too hot.
"Currently expannding." (10) - The controller is currently in the
process of expanding a logical drive.
"Not yet available." (11) - The logical drive is not yet finished
being configured.
"Queued for expansion." (12) - The logical drive will be expended
when the controller is able to begin working on it.
Additionally, the following messages may appear regarding spare drive
status:
"At least one spare drive designated"
"At least one spare drive activated and currently rebuilding"
"At least one activated on-line spare drive is completely rebuilt on this logical drive"
"At least one spare drive has failed"
"At least one spare drive activated"
"At least one spare drive remains available"
For each logical drive, the total number of failed physical drives, if
more than zero, will be reported as:
"Total of n failed physical drives detected on this logical drive."
with "n" replaced by the actual number, of course.
Additionally failure conditions of disk enclosure fans, power supplies,
and temperature are reported as follows:
"Fan failed"
"Temperature problem"
"Door alert"
"Power Supply Unit failed"
FILES
/dev/cciss/c*d0 (Smart Array PCI controllers using the cciss driver)
/dev/sg* (Fibre attached MSA1000 controllers and Smart Array
controllers using the hpsa driver or hpahcisr software RAID driver.)
EXIT CODES
0 - All configured logical drives queried have status of "OK."
1 - One or more configured logical drives queried have status other
than "OK."
AUTHOR
Written by Stephen M. Cameron
REPORTING BUGS
MSA500 G1 logical drive numbers may not be reported correctly.
I’ve seen enclosure serial numbers contain garbage.
Report bugs to <steve.cameron@hp.com>
COPYRIGHT
Copyright © 2007 Hewlett-Packard Development Company, L.P.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
SEE ALSO
http://cciss.sourceforge.net
NOTE 1
The /dev/cciss/c*d0 device nodes of the cciss driver do double duty.
They serve as an access point to both the RAID controllers, and to the
first logical drive of each RAID controller. Notice that a
/dev/cciss/c*d0 node will be present for each controller even if no
logical drives are configured on that controller. It might be cleaner
if the driver had a special device node just for the controller,
instead of making these device nodes do double duty. It has been like
that since the 2.2 linux kernel timeframe. At that time, device major
and minor nodes were statically allocated at compile time, and were in
short supply. Changing this behavior at this point would break lots of
userland programs.