[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] cpu alerts



What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.

Thanks Henrik. The way it is working in the server I have running 4.2.1RC1is how I'm looking for it to run on the standby server, which I just installed 4.2.1P1 on and I'm still getting the alerts with the duration including the yellow time, not just when it goes to red\panic. Is there a way I can change it so that it will work as it does in 4.2.1RC1 and only send an alert after 10 minutes of a red panic that doesn't include the yellow in the duration? I'm also finding I don't get recovery notices if it goes from red to yellow and then to green.


On 8/8/06, Henrik Stoerner <henrik (at) hswn.dk> wrote:

On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote: > >Could you show us a copy of the cpu history log (in > >~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log > >from ~hobbit/server/logs/notifications.log ? > > Here is the hostname.cpu and section from notifications.log for those alerts > this morning: > > >From /hobbit/data/hist/HOSTNAME.cpu > Tue Aug 1 10:34:30 2006 yellow 1154442870 1200 > Tue Aug 1 10:54:30 2006 red 1154444070 600 > Tue Aug 1 11:04:30 2006 green 1154444670 299 > Tue Aug 1 11:09:29 2006 yellow 1154444969 301 > Tue Aug 1 11:14:30 2006 red 1154445270 301 > Tue Aug 1 11:19:31 2006 green 1154445571 > > Tue Aug 1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31) b.perez (at) domain.com[175] 1154444070 200 > Tue Aug 1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31) b.perez (at) domain.com[175] 1154444670 200 1800 > Tue Aug 1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31) b.perez (at) domain.com[175] 1154445570 200 > Tue Aug 1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31) b.perez (at) domain.com[175] 1154445581 200 612

OK, Hobbit thinks the first event begins at 10:34 when the status
goes yellow. Even though this doesn't trigger an alert, it registers
this as the starttime of the event. So when it goes red at 10:54, your
10 minute delay has already elapsed, and you get an immediate alert.
Then when it goes green at 11:04 you of course get a recovery notice.

Same thing when the goes yellow again at 11:09. No alert is sent, but
this time is registered as the start of the event. So at 11:14 when it
goes red you do not get an alert (11:09->11:14 is only 5 minutes), but
you do get the alert at 11:19:30 - and when it goes green at 11:19:31
it sends out a "recovered" message.

What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.


Regards, Henrik


To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk