[hobbit] paging with REPEAT problem...

olivier at qalpit.com olivier at qalpit.com
Mon Mar 28 01:38:55 CEST 2005


> Your hobbitd_alert proces dies for some reason, and when restarting it
> has forgotten about when is the next time to send out an alert.
> 
> So why does it die ... the only reason I can come up with is that it
> catches a signal from a child-process. Could you try changing line 332
> of hobbitd/hobbitd_alert.c from
>    sigaction(SIGPIPE, &sa, NULL);
> to
>    signal(SIGPIPE, SIG_IGN);
> 
> and let me know if that makes it keep on running ? If it does, then
> the mail program that is launched to send the alerts does something
> weird with it's I/O.

i've changed the code, and it keeps doing it in page.log :

2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 15:33:43 Worker process died with exit code 0, terminating
2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:33:43 Channel not available
2005-03-27 22:55:21 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory
2005-03-27 22:58:15 Channel not available
2005-03-27 23:46:48 Worker process died with exit code 0, terminating
2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory
2005-03-27 23:46:48 Channel not available
2005-03-28 00:08:06 Worker process died with exit code 0, terminating
2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory
2005-03-28 00:08:07 Channel not available


i've been sending alert using a script, 
so maybe it's crummy..
i've changes to just sending mail and will let you know if it still have happens


btw, i've just realized that a rule was using a macro that didn't exist... i
dont think that a problem ..?



in the enadis.log (which i suppose is enable/disable)
i got those too :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 19:35:17 Worker process died with exit code 0, terminating
2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory
2005-03-27 19:35:17 Channel not available

I was not playing with maintenance (thow i do have a couple DOWNTIME in
bb-host..), what could be going on here ?



--
olivier



More information about the Xymon mailing list