[hobbit] DURATION rules for specific host alerts
Gary Baluha
gumby3203 at gmail.com
Fri Jun 22 19:36:47 CEST 2007
On 6/22/07, Daniel Bourque <dbourque at weatherdata.com> wrote:
>
> Why would you not want the status to change ? Such a history log is great
> for troubleshooting.
>
I wouldn't want the status to change, because I'm essentially making it a
two-part threshold; one part based on the hard-and-true numeric value, and
another threshold based on the length of time.
if you don't want to be notified about it, just use this in the
> hobbit-alerts.cfg
>
> Page=x
> IGNORE HOST=foo SERVICE=cpu COLOR=red DURATION<5m
>
Ahh, that's the sort of hobbit-alerts rule that would work for me, at least
until (if?) there becomes a way to do what I'm looking for in
hobbit-clients.cfg.
if you don't want it to change the status color on the parent pages , then
> use NOPROPYELLOW:cpu in the bb-hosts file.
>
> if you REALLY don't want it to change status, increase the LOAD numbers in
> the hobbit-clients.cfg file.
>
The problem is that it is only a problem if the load is _sustained_ for more
than 10 minutes or so.
If I set the red threshold to Y, and the load momentarily spikes to Y+1, it
isn't a problem. But if I raise the threshold to Y+2 and now I get a
sustained load of Y+1, it would be a problem since I wouldn't get alerted.
Essentially, I'm looking for a sort of time-based hysteretic monitoring.
-Dan
>
> Gary Baluha wrote:
>
> Is there a [non-messy] way to set a DURATION rule for a specific host
> alert? Basically, what I'm thinking of is something like this:
>
> In hobbit-clients.cfg
> HOST=myhost
> LOAD 20 30 DURATION>5m
>
> The effect being, the status of the "myhost" cpu alert will only change to
> yellow/red if the load is above the appropriate threshold for more than 5
> minutes.
>
> There are a few hosts that occasionally will spike above the cpu load
> thresholds, but only for a few minutes (usually around 5 min at most), and
> then recover on its own. However, I don't want to raise the thresholds,
> because a sustained load (more than 10 minutes) at this level _is_ actually
> a critical event. It's just not critical if it is just a momentary spike.
>
> My specific example is with cpu load, but it could be for other things
> too, such as process counts, memory, or even in some situations, disk space.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20070622/2be13d99/attachment.html>
More information about the Xymon
mailing list