Monitoring HP SmartArray hpacucli utility

How to monitor your HP SmartArray status from your Check_MK installation

Firstly you will need to create a script to speak to your linux server with the hpacucli utility installed from your Nagios / Check_MK server. Please change the 4th line to your server hostname in Check_MK

mkdir -p /scripts/
nano /scripts/raid-monitoring

#!/bin/sh
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin

PROGPATH=/usr/lib/nagios/plugins
hpacucli=”ssh [email protected]YOURSERVERNAME hpacucli”

. $PROGPATH/utils.sh

while getopts “N:cvpsde:E:Vh” options
do
case $options in
N) ;;
c) ;;
v) VERBOSE=1;;
p) PHYSICAL_DRIVE=1;;
s) HPSA=1;;
d) DEBUG=1;;
e) EXCLUDE_SLOT=1
excludeslot=”$OPTARG”;;
E) EXCLUDE_CH=1
excludech=”$OPTARG”;;
V) print_revision $PROGNAME $REVISION
exit 0;;
h) print_help
exit 0;;
\?) print_usage
exit 0;;
*) print_usage
exit 0;;
esac
done

# Check if “HP Controller” work correctly
check=`$hpacucli controller all show status 2>&1`
status=$?
if [ “$DEBUG” = “1” ]; then
echo “### Check if \”HP Controller\” work correctly >>>\n”${check}”\n”
fi
if test ${status} -ne 0; then
echo “RAID UNKNOWN – $hpacucli did not execute properly : “${check}
exit $STATE_UNKNOWN
fi

# Get “Slot” & exclude slot needed
if [ “$EXCLUDE_SLOT” = “1” ]; then
slots=`echo ${check} | egrep -o “Slot \w” | awk ‘{print $NF}’ | grep -v “$excludeslot”`
else
slots=`echo ${check} | egrep -o “Slot \w” | awk ‘{print $NF}’`
fi
if [ “$DEBUG” = “1” ]; then
echo “### Get \”Slot\” & exclude slot not needed >>>\n”${slots}”\n”
fi
for slot in $slots
do
# Get “logicaldrive” for slot
check2b=`$hpacucli controller slot=$slot logicaldrive all show 2>&1`
status=$?
if test ${status} -ne 0; then
echo “RAID UNKNOWN – $hpacucli did not execute properly : “${check2b}
exit $STATE_UNKNOWN
fi
check2=”$check2$check2b”
if [ “$DEBUG” = “1” ]; then
echo “### Get \”logicaldrive\” for slot >>>\n”${check2b}”\n”
fi

# Get “physicaldrive” for slot
if [ “$PHYSICAL_DRIVE” = “1” -o “$DEBUG” = “1” ]; then
check2b=`$hpacucli controller slot=$slot physicaldrive all show | sed -e ‘s/\?/\-/g’ 2>&1 | grep “physicaldrive”`
else
check2b=`$hpacucli controller slot=$slot physicaldrive all show | sed -e ‘s/\?/\-/g’ 2>&1 | grep “physicaldrive” | grep “\(Failure\|Failed\|Rebuilding\)”`
fi
status=$?
if [ “$PHYSICAL_DRIVE” = “1” -o “$DEBUG” = “1” ]; then
if test ${status} -ne 0; then
echo “RAID UNKNOWN – $hpacucli did not execute properly : “${check2b}
exit $STATE_UNKNOWN
fi
fi
check2=”$check2$check2b”
if [ “$DEBUG” = “1” ]; then
echo “### Get \”physicaldrive\” for slot >>>\n”${check2b}”\n”
fi
done

# Get “Chassis” & exclude chassis not needed
if [ “$EXCLUDE_CH” = “1” ]; then
chassisnames=`echo ${check} | grep -v “in a scenario of” | egrep -o “in \w+” | egrep -v “Slot” | awk ‘{print $NF}’ | grep -v “$excludech”`
else
chassisnames=`echo ${check} | grep -v “in a scenario of” | egrep -o “in \w+” | egrep -v “Slot” | awk ‘{print $NF}’`
fi
if [ “$DEBUG” = “1” ]; then
echo “### Get \”Chassis\” & exclude chassis not needed >>>\n”${chassisnames}”\n”
fi
for chassisname in $chassisnames
do
# Get “logicaldrive” for chassisname
check2b=`$hpacucli controller chassisname=”$chassisname” logicaldrive all show 2>&1`
status=$?
if test ${status} -ne 0; then
echo “RAID UNKNOWN – $hpacucli did not execute properly : “${check2b}
exit $STATE_UNKNOWN
fi
check2=”$check2$check2b”
if [ “$DEBUG” = “1” ]; then
echo “### Get \”logicaldrive\” for chassisname >>>\n”${check2b}”\n”
fi

# Get “physicaldrive” for chassisname
if [ “$PHYSICAL_DRIVE” = “1” -o “$DEBUG” = “1” ]; then
check2b=`$hpacucli controller chassisname=”$chassisname” physicaldrive all show | sed -e ‘s/\?/\-/g’ 2>&1 | grep “physicaldrive”`
else
check2b=`$hpacucli controller chassisname=”$chassisname” physicaldrive all show | sed -e ‘s/\?/\-/g’ 2>&1 | grep “physicaldrive” | grep “\(Failure\|Failed\|Rebuilding\)”`
fi
status=$?
if [ “$PHYSICAL_DRIVE” = “1” -o “$DEBUG” = “1” ]; then
if test ${status} -ne 0; then
echo “RAID UNKNOWN – $hpacucli did not execute properly : “${check2b}
exit $STATE_UNKNOWN
fi
fi
check2=”$check2$check2b”
if [ “$DEBUG” = “1” ]; then
echo “### Get \”physicaldrive\” for chassisname >>>\n”${check2b}”\n”
fi
done

# Check STATUS
if [ “$DEBUG” = “1” ]; then
echo “### Check STATUS >>>”
fi
if echo ${check} | egrep Failed >/dev/null; then
echo “RAID CRITICAL – HP Smart Array Failed: “${check} | egrep Failed
exit $STATE_CRITICAL
elif echo ${check} | egrep Disabled >/dev/null; then
echo “RAID CRITICAL – HP Smart Array Problem: “${check} | egrep Disabled
exit $STATE_CRITICAL
elif echo ${check2} | egrep Failed >/dev/null; then
echo “RAID CRITICAL – HP Smart Array Failed: “${check2} | egrep Failed
exit $STATE_CRITICAL
elif echo ${check2} | egrep Failure >/dev/null; then
echo “RAID WARNING – Component Failure: “${check2} | egrep Failure
exit $STATE_WARNING
elif echo ${check2} | egrep Rebuild >/dev/null; then
echo “RAID WARNING – HP Smart Array Rebuilding: “${check2} | egrep Rebuild
exit $STATE_WARNING
elif echo ${check2} | egrep Recover >/dev/null; then
echo “RAID WARNING – HP Smart Array Recovering: “${check2} | egrep Recover
exit $STATE_WARNING
elif echo ${check} | egrep “Cache Status: Temporarily Disabled” >/dev/null; then
echo “RAID WARNING – HP Smart Array Cache Disabled: “${check}
exit $STATE_WARNING
elif echo ${check} | egrep FIRMWARE >/dev/null; then
echo “RAID WARNING – “${check}
exit $STATE_WARNING
else
if [ “$DEBUG” = “1” -o “$VERBOSE” = “1” ]; then
check3=`echo “${check}” | egrep Status`
check3=`echo ${check3}`
echo “RAID OK: “${check2}” [“${check3}”]”
else
echo “RAID OK”
fi
exit $STATE_OK
fi

exit $STATE_UNKNOWN

Note: I cannot take credit for this script, obtained through the Nagios / Check_MK forums.

We will now need to make this script executable:

chmod +x /scripts/raid-monitoring

We will now need to configure the main.mk file in Check_MK to run this script for your host. If you have used this file to configure other additional checks you will likely already have the sections below, just be sure to include these additional commands and checks in the relevant section. If you have not configured any additional checks on Check_MK you can simply paste in all of the below (updating the relevant sections with your server hostname etc).

nano /etc/check_mk/main.mk

extra_nagios_conf += r”””

define command{
command_name linux-raid-mon
command_line /scripts/raid-monitoring -v
}

“””

legacy_checks = [
( ( “linux-raid-mon”, “HP-Raid-Status”, True), [ “YOURSERVERNAME” ] ),
]

Now save and close this file.

We will now need to set up passwordless SSH from your local Nagios user to your linux server with the hpacucli utility installed. (Ensure to change to your nagios user before doing this by using su nagios – If you get the ‘This account is currently not available.’ message please see this guide).

Once you are set up you should be able to ssh to the root user on your remote server by the following (ensure this works before reloading Nagios otherwise the check will fail).

su nagios
ssh [email protected][yourserver]

Finally, we will need to reload Nagios / Check_MK – You should then see this new check pulled through to your host on the Check_MK web interface.

cmk -O

Any comments or questions? Get in touch here or Email me at [email protected]