[hobbit] paging with REPEAT problem...
    olivier at qalpit.com 
    olivier at qalpit.com
       
    Mon Mar 28 01:38:55 CEST 2005
    
    
  
> Your hobbitd_alert proces dies for some reason, and when restarting it
> has forgotten about when is the next time to send out an alert.
> 
> So why does it die ... the only reason I can come up with is that it
> catches a signal from a child-process. Could you try changing line 332
> of hobbitd/hobbitd_alert.c from
>    sigaction(SIGPIPE, &sa, NULL);
> to
>    signal(SIGPIPE, SIG_IGN);
> 
> and let me know if that makes it keep on running ? If it does, then
> the mail program that is launched to send the alerts does something
> weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 15:33:43 Worker process died with exit code 0, terminating
2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:33:43 Channel not available
2005-03-27 22:55:21 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory
2005-03-27 22:58:15 Channel not available
2005-03-27 23:46:48 Worker process died with exit code 0, terminating
2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory
2005-03-27 23:46:48 Channel not available
2005-03-28 00:08:06 Worker process died with exit code 0, terminating
2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory
2005-03-28 00:08:07 Channel not available
i've been sending alert using a script, 
so maybe it's crummy..
i've changes to just sending mail and will let you know if it still have happens
btw, i've just realized that a rule was using a macro that didn't exist... i
dont think that a problem ..?
in the enadis.log (which i suppose is enable/disable)
i got those too :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 19:35:17 Worker process died with exit code 0, terminating
2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory
2005-03-27 19:35:17 Channel not available
I was not playing with maintenance (thow i do have a couple DOWNTIME in
bb-host..), what could be going on here ?
--
olivier
    
    
More information about the Xymon
mailing list