[hobbit] paging with REPEAT problem...
olivier at qalpit.com
olivier at qalpit.com
Mon Mar 28 01:38:55 CEST 2005
> Your hobbitd_alert proces dies for some reason, and when restarting it
> has forgotten about when is the next time to send out an alert.
>
> So why does it die ... the only reason I can come up with is that it
> catches a signal from a child-process. Could you try changing line 332
> of hobbitd/hobbitd_alert.c from
> sigaction(SIGPIPE, &sa, NULL);
> to
> signal(SIGPIPE, SIG_IGN);
>
> and let me know if that makes it keep on running ? If it does, then
> the mail program that is launched to send the alerts does something
> weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 15:33:43 Worker process died with exit code 0, terminating
2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:33:43 Channel not available
2005-03-27 22:55:21 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory
2005-03-27 22:58:15 Channel not available
2005-03-27 23:46:48 Worker process died with exit code 0, terminating
2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory
2005-03-27 23:46:48 Channel not available
2005-03-28 00:08:06 Worker process died with exit code 0, terminating
2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory
2005-03-28 00:08:07 Channel not available
i've been sending alert using a script,
so maybe it's crummy..
i've changes to just sending mail and will let you know if it still have happens
btw, i've just realized that a rule was using a macro that didn't exist... i
dont think that a problem ..?
in the enadis.log (which i suppose is enable/disable)
i got those too :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 19:35:17 Worker process died with exit code 0, terminating
2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory
2005-03-27 19:35:17 Channel not available
I was not playing with maintenance (thow i do have a couple DOWNTIME in
bb-host..), what could be going on here ?
--
olivier
More information about the Xymon
mailing list