[hobbit] strange graph behavior - random machines & graphs

Gary Baluha gumby3203 at gmail.com
Thu Nov 29 22:24:10 CET 2007


Unfortunately, no, I can't do this as our Hobbit server monitors production
machines.  The data directory for the rrd files are SAN-mounted, and we
haven't had disk corruption issues before with this type of setup.

The strange thing is, this only started within the past week, and
unfortunately it seems to be spreading to more and more RRD graphs.

On Nov 29, 2007 3:41 PM, Josh Luthman <josh at imaginenetworksllc.com> wrote:

> Can you do a dd if=/dev/sda of=/dev/null from the disk in which the stuff
> is stored?  If it is so random I'm curious to see if the fs is having
> problems.  I have my money on a bug in the software or bad
> disk/fs/controller.
>
>
> On 11/29/07, Gary Baluha <gumby3203 at gmail.com> wrote:
> >
> > I don't know how many hosts are affected, percentage wise, but it's
> > definitely not every host.  And for the hosts having the problem, it's not
> > even the same graphs that are having the problem.
> >
> > On Nov 29, 2007 3:11 PM, Josh Luthman <josh at imaginenetworksllc.com>
> > wrote:
> >
> > > Same OS at home?
> > >
> > > Not sure if you mentioned this or not but does that weird value show
> > > up in all RRD graphs or just a few hosts?
> > >
> > >
> > > On 11/29/07, Josh Luthman <josh at imaginenetworksllc.com> wrote:
> > > >
> > > > Do they monitor the same devices?  I think there has to be some
> > > > similarity between the two as they had the same problem at the same time
> > > > (though this isn't 100%, it's logically the first place to look).  Hardware
> > > > isn't of much concern here as they don't communicate and the chances of both
> > > > servers going bad on the same date is simply astronomical.
> > > >
> > > > Are there any kind of auto updating services running on them?
> > > >
> > > > On 11/29/07, Gary Baluha < gumby3203 at gmail.com > wrote:
> > > > >
> > > > > We only have two Hobbit servers, and it is affecting both
> > > > > machines.  No, these two Hobbit machines do _not_ communicate with each
> > > > > other in any way.
> > > > >
> > > > > On Nov 29, 2007 2:33 PM, Josh Luthman <
> > > > > josh at imaginenetworksllc.com> wrote:
> > > > >
> > > > > > Is this problem not showing up on another Hobbit server?  Do the
> > > > > > two Hobbit servers with this problem communicate at all (share data/SNMP
> > > > > > traffic/etc)?
> > > > > >
> > > > > > On 11/29/07, Gary Baluha <gumby3203 at gmail.com> wrote:
> > > > > >
> > > > > > > On Nov 29, 2007 12:01 PM, Josh Luthman <
> > > > > > > josh at imaginenetworksllc.com> wrote:
> > > > > > >
> > > > > > > > This is completely beyond my knowledge, but the first place
> > > > > > > > I would look at is any hardware problems, any recent changes (obviously =)
> > > > > > > > and the similarities between those two Hobbit servers issues.
> > > > > > > >
> > > > > > >
> > > > > > > That's the thing, there aren't any similarities between these
> > > > > > > two machines.  They are different hardware, different OS, different network
> > > > > > > segment, and different hosts being monitored.
> > > > > > >
> > > > > > > There were some recent changes in the past month to one of the
> > > > > > > hobbit servers, with a bunch of custom RRD graphs added.  But this wasn't
> > > > > > > done on the other hobbit server.  The only thing changed on the other hobbit
> > > > > > > server is more html web checks added; nothing out of the ordinary.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Josh Luthman
> > > > > > Office: 937-552-2340
> > > > > > Direct: 937-552-2343
> > > > > > 1100 Wayne St
> > > > > > Suite 1337
> > > > > > Troy, OH 45373
> > > > > >
> > > > > > Those who don't understand UNIX are condemned to reinvent it,
> > > > > > poorly.
> > > > > > --- Henry Spencer
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Josh Luthman
> > > > Office: 937-552-2340
> > > > Direct: 937-552-2343
> > > > 1100 Wayne St
> > > > Suite 1337
> > > > Troy, OH 45373
> > > >
> > > > Those who don't understand UNIX are condemned to reinvent it,
> > > > poorly.
> > > > --- Henry Spencer
> > > >
> > >
> > >
> > >
> > > --
> > > Josh Luthman
> > > Office: 937-552-2340
> > > Direct: 937-552-2343
> > > 1100 Wayne St
> > > Suite 1337
> > > Troy, OH 45373
> > >
> > > Those who don't understand UNIX are condemned to reinvent it, poorly.
> > > --- Henry Spencer
> > >
> >
> >
>
>
> --
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20071129/80d992ed/attachment.html>


More information about the Xymon mailing list