[hobbit] procs test keeps paging, although green for +4 days

David Gore David.Gore at verizonbusiness.com
Fri Aug 18 00:14:48 CEST 2006


Henrik Stoerner wrote:
> On Thu, Aug 17, 2006 at 04:05:38PM +0000, David Gore wrote:
>   
>> Interesting problem, we keep getting paged for a procs test on a host 
>> twice a day although the procs test has been green for the last 4+ 
>> days.  It would appear something is stuck?  What to look at?  It is in 
>> couple 'chk' files in ~/server/tmp, but I wouldn't know what to make of 
>> that.  Ideas to troubleshoot?
>>     
>
> The alert.chk.sub file contains the recipients of the alerts currently
> active - there is one line for each recipient. E.g.
>
>    1157011746|myserver|sslcert|mail|henrik at test.com
>
> The alert.chk file has one line for every status that is in a
> potentially alerting state, i.e. it is red, yellow or purple. E.g.
>
> myserver|sslcert|mysite/webservers|10.0.36.166|yellow|1155640943|1157011746|paging|status ...
>
> The field that says "paging" has "norecip" if the status doesn't have
> any alert-recipients defined, or if the alerts are restricted, e.g. to
> a certain time of day.
>
>
> I've never seen it happen, but there is a very small time window between
> the startup of the hobbitd daemon and the startup of the hobbitd_alert
> module where a green update is registered with hobbitd, but it doesn't
> make it to the hobbitd_alert module - and then you have this situation.
>
> Restarting the hobbitd_alert module - just kill the hobbitd_alert
> process, it will restart automatically - should clean it up, logging a
> message like "Stale alert for HOSTNAME:TEST dropped" to the page.log
> file.
>
>   
Killed hobbitd_alert, it restarted and dropped the bogus alert as you 
said it would.  Thank you!


~David



More information about the Xymon mailing list