[hobbit] Critical Systems view loading problem

Gary Baluha gumby3203 at gmail.com
Fri Dec 28 15:41:59 CET 2007


Okay, I think I figured out the issue I was having.  I noticed it happen
again this morning, and when I took a look at which alerts were showing as
red (from the "all non-green" page), I noticed a pattern that everytime the
critical systems page was showing an Internal Server Error, the same alerts
were showing as red in the non-green page.

In short, the hosts look like "A-B-C-[1-4]".  I manually edited the
hobbit-nkview.cfg file to remove all of the staging machines
("A-stag-C-[1-4]"), which are all clones of the same entry.  I found that if
I just deleted the master clone entries, the critical systems page was
working.  As soon as I reverted the file to the previous version, the
problem came back.  I then deleted the clone entries again and readded them,
and all is working fine now.

So it appears that where the documentation says "don't edit the
hobbit-nkview.cfg file manually", it means it ;-)  Still, it may be useful
to have some sort of script or something to run that can check the syntax of
the hobbit-nkview.cfg for errors.  I think that as long as the file is named
".cfg" and is in the same directory in hobbit as all the other configuration
files, some people will be tempted to manually edit the file.

On Dec 24, 2007 11:11 PM, Gary Baluha <gumby3203 at gmail.com> wrote:

> Actually, one more thing to add...  I take it back, there was an alert
> status change between when it was working and when it wasn't, and it is now
> again not working correctly.  Also, I just recalled that if I zeroed out the
> contents of the hobbit-nkview.cfg file, the Critical Systems page started
> working again (albeit it with no alerts showing up).
>
>
> On Dec 24, 2007 11:09 PM, Gary Baluha <gumby3203 at gmail.com> wrote:
>
> > As I previously posted, I get this problem every now and then as well.
> > About a month back, the Critical Systems page suddenly become useless when
> > it became stuck with that "Internal Server Error" issue.  My co-worker came
> > across an apparent fix that the file permissions for the
> > hobbit-nkview.cfg file were wrong, and the --debug option in
> > hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from
> > loading.  This now appears NOT to be the case, because the eternal Internal
> > Server Error problem is back.  It seems it was just coincidence that he made
> > the changes when the Critical Systems page started working again.
> >
> > Also, while I was in the process of typing the above section, it appears
> > the Critical Systems page is working again.  I made absolutely no changes to
> > anything during this time.  Unfortunately now, as before, I cannot determine
> > any causal relationship.  Additionally, unlike Tracy's problem below, it
> > doesn't appear to be related to the alerts that are showing up either (I can
> > confirm that no alert statuses changed while I was writing this).
> >
> > I'm going to have to go with Tracy's assesment that it is a pointer
> > issue as pointed out.  I do recall during my programming days of incorrect
> > pointer usage in the code causing intermittent and non-reproducible errors
> > occuring...  Unfortunately, it's been a while since I've programmed in
> > C/C++, and I would have to spend a while with the code to see if this really
> > is the issue, and how to fix it.  All I know is, it sounds plausible.
> >
> > Anyone else have any ideas, or am I just going a little off the deep end
> > with this (which is quite possible)?
> >
> >
> > On Sep 17, 2007 4:43 PM, Tracy Di Marco White <gendalia at gmail.com>
> > wrote:
> >
> > > On 9/7/07, Henrik Stoerner <henrik at hswn.dk> wrote:
> > > > On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White
> > > wrote:
> > > > > I'm getting an "Internal Server Error" and the error log shows
> > > > > "Premature end of script headers: hobbit-nkview.sh".  My problem
> > > seems
> > > > > to be related to a test being yellow right now, and right now
> > > being
> > > > > outside of the parameters of when the machine/test combo is
> > > critical.
> > > > > If I change the critical time for the event from "|W:0800:1700|"
> > > to
> > > > > "||", the critical systems page comes up fine.  If I put the time
> > > > > constraints back, the page fails to come up again.  It started
> > > failing
> > > > > after 1700, although I didn't notice it for about 15 minutes.  Is
> > > > > anyone else seeing this problem?
> > > >
> > > > Interesting, it does sound like a bug. Could you send me that line
> > > from
> > > > the hobbit-nkview.cfg file ?
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071228/800f1ec7/attachment.html>


More information about the Xymon mailing list