[hobbit] [re-post] xymon notifications

Ralph Mitchell ralphmitchell at gmail.com
Wed Jun 3 18:45:16 CEST 2009


On Tue, Jun 2, 2009 at 2:18 PM, J Sloan <joe at tmsusa.com> wrote:

> T.J. Yang wrote:
> > Hi, joe
> >
> > We are running two xymon servers across WAN network also.
> > Here is a brief description how we did it.
> >
> > 1. xymon1 is the primpary one and xymon2 is the standby one which is
> dumb(not alerting).
> > 2. all the clients send xymon messags to both xymon1 and xymon2.
> > 3. on xymon2(standby),
> >     1. we have a cron entry to sync xymon1 config files every 5 minutes.
> >     2. there is a xymon2 hertbeat server side external module to check
> the health of xymon1.
> >        if xymon1 is head or not healthy, this module will enable xymon2
> with [bbpage] section enabled.
> >     3. heartbeat server side module will disable its alerting once xymon1
> is back online.
> >
> > So we have a semi-auto fail-over architecture. but we need to take the
> lost of missing metrics information on xymon1 during its' down time.
> >
> >  keeping two xymon server in sync on same LAN is easy using HA/clustering
> software.
> > but keeping two xyomn servers in sync on two WANs far away is not easy. I
> heard Sun's clustering software has new feature to enable clustering over
> WANs, but I haven't study this myself.
> >
> T.J. -
>
> Thanks for you insights. Your setup sounds like an engineering tour de
> force, but our needs are much simpler than that - no cluster is needed in
> our environment, the redundant xymon servers are providing all the
> reliability we need and more. In fact, a cluster would be hard to implement
> since the corresponding xymon servers are in separate networks, hundreds of
> miles apart.
>
> Our problem with xymon is all the duplicated alerts. If there were some way
> to get xymon to emulate big brother in this regard it would be ideal.
>
> The ideas posted here so far have merit, but I'm still trying to think
> through all the options to come up with the simplest way to suppress the
> duplicate alerts without introducing a new single point of failure.


I once had a old Compaq desktop system and laptop, both running Gentoo Linux
with 'heartbeat' installed.  Whenever I shutdown the laptop,  the Compaq
'acquired' its IP address, so that it wouldn't be given away to anyone else
on that segment of the company network.  It wasn't anything fancy, just
heartbeat packets being exchanged over the network every few seconds.

As long as your two xymon servers are sending each other status messages,
you could use that for the heartbeat.  Something like this in an external
script:

     X=`server/bin/bb localhost 'hobbitdboard host=xymon1 test=bbd
fields=logtime'`
     Y=`date +%s`
     Z=`expr $Y - $X`
     if [ $Z -ge 600 ]; then
        # do stuff to enable paging
     fi

If you were using a script to send the pages out, enabling paging could be
as simple as "touch $BBTMP/pager", then in the pager script, do this:

     if [ -f $BBTMP/pager ]; then
        # send the page
     fi

Ralph Mitchell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20090603/ee747dab/attachment.html>


More information about the Xymon mailing list