[Xymon] Serious trouble, red after yellow didn't page at all tonight

Josh Luthman josh at imaginenetworksllc.com
Wed Apr 6 05:55:00 CEST 2011


Did Xymon create the alert?  I believe there is a log specifically for this.
On Apr 5, 2011 11:45 PM, "Elizabeth Schwartz" <betsy.schwartz at gmail.com>
wrote:
> Yesterday we had a red after yellow page all the way up the hierarchy
> immediately. Today we had a red after yellow not page at ALL.
> It did page in BB (test is going to both servers during this test
> period. Running Xymon 4.3.0 and really hoping to go live ASAP
>
> Here are the hist log entries, see it go red at 22:07 for five minutes
>
> Tue Apr 5 17:32:35 2011 red 1302039155 300
> Tue Apr 5 17:37:35 2011 green 1302039455 900
> Tue Apr 5 17:52:35 2011 yellow 1302040355 599
> Tue Apr 5 18:02:34 2011 red 1302040954 601
> Tue Apr 5 18:12:35 2011 yellow 1302041555 1199
> Tue Apr 5 18:32:34 2011 red 1302042754 900
> Tue Apr 5 18:47:34 2011 yellow 1302043654 12002
> Tue Apr 5 22:07:36 2011 red 1302055656 300
> Tue Apr 5 22:12:36 2011 yellow 1302055956
>
> History shows critical status:
> Tue Apr 5 22:07:36 EDT 2011 OTHER Applications ( "mysqle1" ): CRITICAL
>
> And it paged and emailed earlier in the evening: (domain name elided).
> It paged correctly at 6:34 and 6:45 but nothing at 10:07:
>
> Tue Apr 5 17:34:28 2011 db0.other (10.100.4.51) techops[160] 1302039268 0
> Tue Apr 5 17:34:28 2011 db0.com.other (10.100.4.51) alert1[162] 1302039268
0
> Tue Apr 5 17:37:35 2011 db0.other (10.100.4.51) techops[160] 1302039455 0
300
> Tue Apr 5 17:52:35 2011 db0.other (10.100.4.51) techops[160] 1302040355 0
> Tue Apr 5 17:52:35 2011 db0.other (10.100.4.51) ticket[161] 1302040355 0
> Tue Apr 5 18:04:18 2011 db0.other (10.100.4.51) techops[160] 1302041058 0
> Tue Apr 5 18:04:18 2011 db0.other (10.100.4.51) alert1[162] 1302041058 0
> Tue Apr 5 18:34:18 2011 db0.other (10.100.4.51) alert1[162] 1302042858 0
> Tue Apr 5 18:34:18 2011 db0.other (10.100.4.51) alert2[163] 1302042858 0
> Tue Apr 5 18:34:18 2011 db0.other (10.100.4.51) alert3[164] 1302042858 0
> Tue Apr 5 18:45:02 2011 db0.other (10.100.4.51) alert1[162] 1302043502 0
> Tue Apr 5 18:45:02 2011 db0.other (10.100.4.51) alert2[163] 1302043502 0
> Tue Apr 5 18:45:02 2011 db0.other (10.100.4.51) alert3[164] 1302043502 0
>
>
> And here are lines 159-165 in the hobbit-alerts.cfg:
> HOST=%^db EXHOST=%.*dl2.example* SERVICE=other
> MAIL techops REPEAT=1d RECOVERED
> MAIL ticket REPEAT=1d COLOR=yellow # open
> ticket email
> MAIL alert1 REPEAT=10 COLOR=red,purple FORMAT=SMS# page onshift or
> oncall at start RED, rep every 10 minutes
> MAIL alert2 DURATION>20 REPEAT=10 COLOR=red,purple FORMAT=SMS# page
> secondary after 20 mins RED . Repevery 10 minutes
> MAIL alert3 DURATION>40 REPEAT=10 COLOR=red,purple FORMAT=SMS# page
> tertiary after 40 mins RED. Rep every 10mins
> MAIL alert4 DURATION>60 REPEAT=10 COLOR=red,purple FORMAT=SMS# page
> team after 60 mins RED. Rpt every 10mins
>
>
> I don't believe it was acked or signed out. It' s a complex custom test
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20110405/480ca964/attachment.html>


More information about the Xymon mailing list