[Xymon] Hierarchial alerting

Dugan, Darin D [EIT] dddugan at iastate.edu
Wed Mar 30 18:03:42 CEST 2011


Phil,
In addition to the 'depends' tag as suggested by Chris Morris, you may also consider the 'route' tag. I find it simpler to use for wan sites, though the behavior is somewhat different. Whereas depends will make dependent tests go clear, route makes conn tests go yellow with a message that it is down because device ABC is down. Something like:

1.2.3.4	wan_router	# conn
1.2.3.5	site_router	# conn route:wan_router
1.2.3.6	site_switch	# conn route:wan_router,site_router
1.2.3.7	print_server	# conn ftp route:wan_router,site_router,site_switch

So if wan_router goes down the other three will go yellow and therefore not alert. And when a tech looks at the yellow status it will plainly state it is down because wan_router is down.

Cheers.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Phil Crooker
Sent: Tuesday, March 29, 2011 6:57 PM
To: xymon at xymon.com
Subject: [Xymon] Hierarchial alerting

Hi All,

I'm just in the process of converting our old big brother monitor to xymon. I had a look at zabbix, nagios, groundworks, and pandorafms and found too many quirks, cruft and gotchas (plus the problem of open core systems). I returned to the bb clones and found xymon is not only active and maintained. I must say Xymon is a well thought out, well executed step beyond big brother, well done, Henrik. 

Is there a way to have a hierarchy for alerts? For example, say we have a branch office with a print server, file server, router, switch, etc. I have a xymon entry for each (and multiples for each, ie poll the ftp port on the file server as well as just the conn monitoring). If the WAN link to the site dies, I get alerts for all the above, where as I should really just get one alert for the router. Or if the file server dies, I shouldn't get both ftp and file server alerts, just the server.

I know this wasn't available for big brother and can't find anything in xymon. Does this exsit and/or has this been considered? 

For our big brother system, I created a  perl script daemon that reads in the allevents log to create a "stateful" table of all monitored items. I also have a text file table of the relationships between the monitored items, using tabs:

core_switch
         wan_router
                 site_router
                         site_switch
                                 printer
                                 server
                                          ftp
         bbd_host
         local_server
                 smtp
         user_subnet_switch
                 printer

This table is read into a hashed array, the allevents log file is tailed to keep the state table current and the script awaits queries.
When an alertable event comes in, this daemon is queried by the alert subsystem (using another perl script), the daemon checks to see if any parent items are in a red state, if so the alert is discarded.

This is effectively just a plugin and could be more efficiently done if integrated into the bbd. In terms of config files, having to maintain two files is not ideal. Currently the host listing and web layout are effectively combined in one file, so adding a hierarchy as above would be tricky. Perhaps a second "#" with the parent following, ie:

            1.2.3.4        hostname.domain.com        # smtp dns #
core_switch.domain.com

Anyway, this is just an initial query about this issue.

thanks and regards, Phil


_______________________________________________
Xymon mailing list
Xymon at xymon.com
http://lists.xymon.com/mailman/listinfo/xymon



More information about the Xymon mailing list