[hobbit] wake up call
Josh Luthman
josh at imaginenetworksllc.com
Wed May 21 18:07:05 CEST 2008
After those three mornings would mind commenting those hosts to be certain
that reproduces the issue?
On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <gleonard at progrexion.com>
wrote:
> Ok.. well it did not do it this morning after adding all of my monitored
> hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
> my /etc/hosts file, modified in to proper format.. no pages this morning..
> so it could have been a dns issue.. if I am clear for three more mornings
> then I will be satisfied… I will let you know..
>
>
>
> -Gavin
>
>
>
> *From:* Josh Luthman [mailto:josh at imaginenetworksllc.com]
> *Sent:* Tuesday, May 20, 2008 10:24 PM
>
> *To:* hobbit at hswn.dk
> *Subject:* Re: [hobbit] wake up call
>
>
>
> Thanks for the heads up. I am very interested in knowing what is the cause
> and more importantly the solution to your issue, as it may fix mine!
>
> It would VERY nice to be able to print out uptime and availability reports
> without the dozens of 1 minute outages. I know my issue is related to the
> box itself (hardware or software) as the issue appears on the hobbit server
> itself.
>
> On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <gleonard at progrexion.com>
> wrote:
>
> Most if not all of my servers are defined by ip anyway, I have a very
> segmented network so dns is not very helpful across all the different
> domains and subnets.. i use my hosts file for the most part.. now that I
> think of it, I wonder if the ones in the host file are still ok? I will let
> you know…
>
>
>
> -Gavin
>
>
>
> *From:* Phil Wild [mailto:philwild at gmail.com]
> *Sent:* Tuesday, May 20, 2008 7:12 PM
>
>
> *To:* hobbit at hswn.dk
> *Subject:* Re: [hobbit] wake up call
>
>
>
> Can I suggest you use IP addresses for a number of servers and see if they
> survive through your next episode. That will give you an idea of where the
> problem might be...
>
>
>
> It is the least amount of work towards identifying the cause.
>
>
>
> Cheers
>
>
>
> Phil
>
> 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <katherine.hosch at navy.mil>:
>
> Check your apache log restarts in cron....
>
>
> -----Original Message-----
> From: Josh Luthman [mailto:josh at imaginenetworksllc.com]
> Sent: Tuesday, May 20, 2008 10:38
> To: hobbit at hswn.dk
> Subject: Re: [hobbit] wake up call
>
> What most people suggest is having a local DNS server, on the Hobbitmon
> server itself.
>
> As this is happening at the same time every single day I don't believe
> DNS would be the cause of the issue, though it is worth taking a look at
> until another idea comes along.
>
>
> On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
> <gleonard at progrexion.com> wrote:
>
>
> Happened again this morning.. so I am going to try a different
> dns server.
>
>
>
> -Gavin
>
>
>
> From: Phil Wild [mailto:philwild at gmail.com]
> Sent: Monday, May 19, 2008 10:38 PM
> To: hobbit at hswn.dk
> Subject: Re: [hobbit] wake up call
>
>
>
> Hmmm... bummer, there goes that theory... If you are using IP
> addresses, and you are still getting failures on these hosts, then dns
> is not involved. A ttl of five minutes is fairly worthless for a caching
> server. It only helps if it hits the same device within five minutes, as
> hobbit is pinging every five mins (default), you will most likely always
> be pulling from your master/slaves...
>
>
>
> Phil
>
> 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>
> Well almost (good 99%) of my hosts have the testip tag, so it
> doesn't
> need to look up the names. The things it does look up are 5m
> TTLs
>
> though.
>
>
>
> On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
> > What is ttl set to for your domain? It would be interesting to
> see if the
> > issue reduces with a higher ttl. Another way to ensure this is
> not the area
> > of the issue would be to set the dns server up as a slave.
> >
> > Phil
> >
> > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
> >
> >> That was someone's theory in a very large post about this
> issue in the
> >> past. I did install a caching only named on the box and it
> did not
> >> fix the problem.
> >>
> >> Did relieve the stress of my other DNS server though :)
> >>
> >>
> >>
> >> On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
> >> > Hi Josh,
> >> >
> >> > This doesn't relate to the apache error, it relates to your
> problem...
> >> This
> >> > is a theory...
> >> >
> >> > I am wondering if you are running a caching name server on
> your hobbit
> >> > installation? If not, I am wondering if the fping places
> too high a load
> >> on
> >> > your dns server and misses the occassional host. Even with
> a caching dns
> >> > server you may see the issue every time ttl expires.
> >> >
> >> > Phil
> >> >
> >> > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
> >> >
> >> >> Gavin,
> >> >>
> >> >> I am having a very similar issue - though it is not every
> single day.
> >> My
> >> >> issue is that every host (or almost all of the hosts) will
> have
> >> >> conn:red
> >> >> and
> >> >> then come back up ~60s later. I just confirmed this
> weekend that it is
> >> >> not
> >> >> related the Via NIC (Using an Intel Pro/100 S now).
> >> >>
> >> >> An issue like that is almost always Apache related. Can
> you post the
> >> >> errors in /var/log/httpd/error_log from this time period?
> >> >>
> >> >> Josh
> >> >>
> >> >>
> >> >> On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
> <gleonard at progrexion.com
> >> >
> >> >> wrote:
> >> >>
> >> >>> Every morning at 7am I get pages from every host I
> monitor including
> >> the
> >> >>> display server, that its connection recovered.. the it
> runs great for
> >> >>> the
> >> >>> next 23hrs. looking at hobbit web page I see no down
> time nor do the
> >> >>> servers show any down time. But when I click on the
> historical web
> >> link
> >> >>> to
> >> >>> see the info.. I get this.. I really love hobbit.. but I
> am not a Web
> >> >>> guy
> >> >>> at all and I think it might be apache related...
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> *Internal Server Error*
> >> >>>
> >> >>> The server encountered an internal error or
> misconfiguration and was
> >> >>> unable to complete your request.
> >> >>>
> >> >>> Please contact the server administrator, root at localhost
> and inform
> >> them
> >> >>> of the time the error occurred, and anything you might
> have done that
> >> may
> >> >>> have caused the error.
> >> >>>
> >> >>> More information about this error may be available in the
> server error
> >> >>> log.
> >> >>> ------------------------------
> >> >>>
> >> >>> *Apache/2.0.54 (Yellowdog) Server at misery.pgx.local
> Port 80*
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> *Gavin Leonard*
> >> >>>
> >> >>> [image: cid:image001.gif at 01C856AD.922EF120]
> >> >>>
> >> >>> Director, Systems-Network Engineering
> >> >>>
> >> >>> *T*
> >> >>>
> >> >>> 801-828-1735
> >> >>>
> >> >>> *F*
> >> >>>
> >> >>> 801-828-1704
> >> >>>
> >> >>> *E*
> >> >>>
> >> >>> gleonard at progrexion.com
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Research | Marketing | Sales Generation
> >> >>>
>
> >> >>> *www.progrexion.com <http://www.progrexion.com/> *
>
> <http://www.progrexion.com/>
> >> >>>
> >> >>>
> >> >>>
> >> >>> This email and its contents are confidential. If you are
> not the
> >> intended
> >> >>> recipient, delete this email and do not use or disclose
> the
> >> >>> information
> >> >>> within this email or its attachments. Thank you.
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Josh Luthman
> >> >> Office: 937-552-2340
> >> >> Direct: 937-552-2343
> >> >> 1100 Wayne St
> >> >> Suite 1337
> >> >> Troy, OH 45373
> >> >>
> >> >> Those who don't understand UNIX are condemned to reinvent
> it, poorly.
> >> >> --- Henry Spencer
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Tel: 0400 466 952
> >> > Fax: 0433 123 226
>
> >> > email: philwild AT gmail.com <http://gmail.com/>
>
> >> >
> >>
> >>
> >> --
> >> Josh Luthman
> >> Office: 937-552-2340
> >> Direct: 937-552-2343
> >> 1100 Wayne St
> >> Suite 1337
> >> Troy, OH 45373
> >>
> >> Those who don't understand UNIX are condemned to reinvent it,
> poorly.
> >> --- Henry Spencer
> >>
> >> To unsubscribe from the hobbit list, send an e-mail to
> >> hobbit-unsubscribe at hswn.dk
> >>
> >>
> >>
> >
> >
> > --
> > Tel: 0400 466 952
> > Fax: 0433 123 226
>
> > email: philwild AT gmail.com <http://gmail.com/>
>
> >
>
>
>
> --
>
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it,
> poorly.
> --- Henry Spencer
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
>
>
>
> --
> Tel: 0400 466 952
> Fax: 0433 123 226
> email: philwild AT gmail.com
>
>
>
>
> --
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
>
> --
> Tel: 0400 466 952
> Fax: 0433 123 226
> email: philwild AT gmail.com
>
>
>
>
> --
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>
--
Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080521/7a740b82/attachment.html>
More information about the Xymon
mailing list