[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Alert if a list of devices that are not related are all down?



In <2B2CEF0E4EE10B449E5D9BB95E6DA0E8FD11A0 (at) MAIL2.csw.l-3com.com> d.tom.schmitt (at) L-3com.com writes:

>I would like to have XYMON:
>         E.G.  All printers are down at the same time - ALERT
>               All printers just came back up at the same time - NOTIFY/ALERT

>I need to watch multiple printers in a building to see if they are ALL
>down (or come up) at the same time.

>This is the makings of a POWER OUTAGE EVENT for that building since the
>printers are not attached to a UPS.

>If all down, Possible Power Outage

Assuming you have some way of easily identifying your printers
- e.g. they are all on the same webpage in the Xymon display, or
they have some sort of standard name - then you can use the
'hobbitdboard' command to check the status of all of them at 
once.

E.g. if you have all the printers on a page called "printers",
then this would tell you if they were all down:

   #!/bin/sh

   # Grab "conn" status of all systems on "printers" page
   # Only pick the red and green ones, so we ignore those that
   # have been disabled.
   $BB $BBDISP "hobbitdboard page=printers color=red,green test=conn fields=hostname,color" >/tmp/printstatus.$$
   PRINTERCOUNT=`wc -l /tmp/printstatus.$$`
   DOWNCOUNT=`grep '^|red$' /tmp/printstatus.$$ | wc -l`

   if test $PRINTERCOUNT = $DOWNCOUNT
   then
      echo "All printers down!"
   fi

What I would do is feed the information from this script back into
Xymon as a new status - one that shows red if all printers are down,
and green if at least one of them is up. Then you can trigger the
alert from this status, instead of mucking about with the alert
scripts for each of the printers. So you could modify the script
above to become a Xymon server-side extension:

   #!/bin/sh

   # ... beginning is the same ...

   if test $PRINTERCOUNT = $DOWNCOUNT
   then
       # All printers are down
       $BB $BBDISP "status whitehouse.power red `date`
                    Possible power-loss at 1600 Pennsylvania Av"
   else
       # At least one printer is up
       $BB $BBDISP "status whitehouse.power green `date`
                   Power OK"
   fi


And then setup an alert that goes off when the "power" status for
host "whitehouse" goes red.

You'd run the script from hobbitlaunch.cfg every so often, e.g.
to run it every 5 minutes add 
   [powercheck]
       CMD $BBHOME/ext/powercheck.sh
       INTERVAL 5m
to hobbitlaunch.cfg


If you must check if the change for each printer happened "recently"
(e.g. within the past 5 minutes which is the default network test
frequency), then you can add "lastchange" to the list of the fields
retrieved in the hobbitdboard command. That will give you the Unix
timestamp when the status last changed; you can then have the script
compare that to the current timestamp and do whatever appropriate if
the change happened more or less recently. (The GNU 'date' utility 
can give you the current timestamp with "date +%s").


Hope this gives you some inspiration to put this together.


Regards,
Henrik

-- 
Henrik Storner