[hobbit] wake up call

Josh Luthman josh at imaginenetworksllc.com
Wed May 21 18:07:05 CEST 2008


After those three mornings would mind commenting those hosts to be certain
that reproduces the issue?

On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <gleonard at progrexion.com>
wrote:

>  Ok.. well it did not do it this morning after adding all of my monitored
> hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
> my /etc/hosts file, modified in to proper format.. no pages this morning..
> so it could have been a dns issue.. if I am clear for three more mornings
> then I will be satisfied… I will let you know..
>
>
>
> -Gavin
>
>
>
> *From:* Josh Luthman [mailto:josh at imaginenetworksllc.com]
> *Sent:* Tuesday, May 20, 2008 10:24 PM
>
> *To:* hobbit at hswn.dk
> *Subject:* Re: [hobbit] wake up call
>
>
>
> Thanks for the heads up.  I am very interested in knowing what is the cause
> and more importantly the solution to your issue, as it may fix mine!
>
> It would VERY nice to be able to print out uptime and availability reports
> without the dozens of 1 minute outages.  I know my issue is related to the
> box itself (hardware or software) as the issue appears on the hobbit server
> itself.
>
> On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <gleonard at progrexion.com>
> wrote:
>
> Most if not all of my servers are defined by ip anyway, I have a very
> segmented network so dns is not very helpful across all the different
> domains and subnets.. i use my hosts file for the most part.. now that I
> think of it, I wonder if the ones in the host file are still ok?  I will let
> you know…
>
>
>
> -Gavin
>
>
>
> *From:* Phil Wild [mailto:philwild at gmail.com]
> *Sent:* Tuesday, May 20, 2008 7:12 PM
>
>
> *To:* hobbit at hswn.dk
> *Subject:* Re: [hobbit] wake up call
>
>
>
> Can I suggest you use IP addresses for a number of servers and see if they
> survive through your next episode. That will give you an idea of where the
> problem might be...
>
>
>
> It is the least amount of work towards identifying the cause.
>
>
>
> Cheers
>
>
>
> Phil
>
> 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <katherine.hosch at navy.mil>:
>
> Check your apache log restarts in cron....
>
>
> -----Original Message-----
> From: Josh Luthman [mailto:josh at imaginenetworksllc.com]
> Sent: Tuesday, May 20, 2008 10:38
> To: hobbit at hswn.dk
> Subject: Re: [hobbit] wake up call
>
> What most people suggest is having a local DNS server, on the Hobbitmon
> server itself.
>
> As this is happening at the same time every single day I don't believe
> DNS would be the cause of the issue, though it is worth taking a look at
> until another idea comes along.
>
>
> On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
> <gleonard at progrexion.com> wrote:
>
>
>        Happened again this morning.. so I am going to try a different
> dns server.
>
>
>
>        -Gavin
>
>
>
>        From: Phil Wild [mailto:philwild at gmail.com]
>        Sent: Monday, May 19, 2008 10:38 PM
>        To: hobbit at hswn.dk
>        Subject: Re: [hobbit] wake up call
>
>
>
>        Hmmm... bummer, there goes that theory... If you are using IP
> addresses, and you are still getting failures on these hosts, then dns
> is not involved. A ttl of five minutes is fairly worthless for a caching
> server. It only helps if it hits the same device within five minutes, as
> hobbit is pinging every five mins (default), you will most likely always
> be pulling from your master/slaves...
>
>
>
>        Phil
>
>        2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>
>        Well almost (good 99%) of my hosts have the testip tag, so it
> doesn't
>        need to look up the names.  The things it does look up are 5m
> TTLs
>
>        though.
>
>
>
>        On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>        > What is ttl set to for your domain? It would be interesting to
> see if the
>        > issue reduces with a higher ttl. Another way to ensure this is
> not the area
>        > of the issue would be to set the dns server up as a slave.
>        >
>        > Phil
>        >
>        > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>        >
>        >> That was someone's theory in a very large post about this
> issue in the
>        >> past.  I did install a caching only named on the box and it
> did not
>        >> fix the problem.
>        >>
>        >> Did relieve the stress of my other DNS server though :)
>        >>
>        >>
>        >>
>        >> On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>        >> > Hi Josh,
>        >> >
>        >> > This doesn't relate to the apache error, it relates to your
> problem...
>        >> This
>        >> > is a theory...
>        >> >
>        >> > I am wondering if you are running a caching name server on
> your hobbit
>        >> > installation? If not, I am wondering if the fping places
> too high a load
>        >> on
>        >> > your dns server and misses the occassional host. Even with
> a caching dns
>        >> > server you may see the issue every time ttl expires.
>        >> >
>        >> > Phil
>        >> >
>        >> > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>        >> >
>        >> >> Gavin,
>        >> >>
>        >> >> I am having a very similar issue - though it is not every
> single day.
>        >>  My
>        >> >> issue is that every host (or almost all of the hosts) will
> have
>        >> >> conn:red
>        >> >> and
>        >> >> then come back up ~60s later.  I just confirmed this
> weekend that it is
>        >> >> not
>        >> >> related the Via NIC (Using an Intel Pro/100 S now).
>        >> >>
>        >> >> An issue like that is almost always Apache related.  Can
> you post the
>        >> >> errors in /var/log/httpd/error_log from this time period?
>        >> >>
>        >> >> Josh
>        >> >>
>        >> >>
>        >> >> On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
> <gleonard at progrexion.com
>        >> >
>        >> >> wrote:
>        >> >>
>        >> >>>  Every morning at 7am I get pages from every host I
> monitor including
>        >> the
>        >> >>> display server,  that its connection recovered.. the it
> runs great for
>        >> >>> the
>        >> >>> next 23hrs.  looking at hobbit web page I see no down
> time nor do the
>        >> >>> servers show any down time.  But when I click on the
> historical web
>        >> link
>        >> >>> to
>        >> >>> see the info.. I get this.. I really love hobbit..  but I
> am not a Web
>        >> >>> guy
>        >> >>> at all and I think it might be apache related...
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>> *Internal Server Error*
>        >> >>>
>        >> >>> The server encountered an internal error or
> misconfiguration and was
>        >> >>> unable to complete your request.
>        >> >>>
>        >> >>> Please contact the server administrator, root at localhost
> and inform
>        >> them
>        >> >>> of the time the error occurred, and anything you might
> have done that
>        >> may
>        >> >>> have caused the error.
>        >> >>>
>        >> >>> More information about this error may be available in the
> server error
>        >> >>> log.
>        >> >>>  ------------------------------
>        >> >>>
>        >> >>> *Apache/2.0.54 (Yellowdog) Server at misery.pgx.local
> Port 80*
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>> *Gavin Leonard*
>        >> >>>
>        >> >>> [image: cid:image001.gif at 01C856AD.922EF120]
>        >> >>>
>        >> >>> Director, Systems-Network Engineering
>        >> >>>
>        >> >>> *T*
>        >> >>>
>        >> >>>  801-828-1735
>        >> >>>
>        >> >>> *F*
>        >> >>>
>        >> >>>  801-828-1704
>        >> >>>
>        >> >>> *E*
>        >> >>>
>        >> >>>  gleonard at progrexion.com
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>> Research | Marketing | Sales Generation
>        >> >>>
>
>        >> >>> *www.progrexion.com <http://www.progrexion.com/> *
>
> <http://www.progrexion.com/>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>> This email and its contents are confidential. If you are
> not the
>        >> intended
>        >> >>> recipient, delete this email and do not use or disclose
> the
>        >> >>> information
>        >> >>> within this email or its attachments. Thank you.
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>>
>        >> >>
>        >> >>
>        >> >>
>        >> >> --
>        >> >> Josh Luthman
>        >> >> Office: 937-552-2340
>        >> >> Direct: 937-552-2343
>        >> >> 1100 Wayne St
>        >> >> Suite 1337
>        >> >> Troy, OH 45373
>        >> >>
>        >> >> Those who don't understand UNIX are condemned to reinvent
> it, poorly.
>        >> >> --- Henry Spencer
>        >> >
>        >> >
>        >> >
>        >> >
>        >> > --
>        >> > Tel: 0400 466 952
>        >> > Fax: 0433 123 226
>
>        >> > email: philwild AT gmail.com <http://gmail.com/>
>
>        >> >
>        >>
>        >>
>        >> --
>        >> Josh Luthman
>        >> Office: 937-552-2340
>        >> Direct: 937-552-2343
>        >> 1100 Wayne St
>        >> Suite 1337
>        >> Troy, OH 45373
>        >>
>        >> Those who don't understand UNIX are condemned to reinvent it,
> poorly.
>        >> --- Henry Spencer
>        >>
>        >> To unsubscribe from the hobbit list, send an e-mail to
>        >> hobbit-unsubscribe at hswn.dk
>        >>
>        >>
>        >>
>        >
>        >
>        > --
>        > Tel: 0400 466 952
>        > Fax: 0433 123 226
>
>        > email: philwild AT gmail.com <http://gmail.com/>
>
>        >
>
>
>
>        --
>
>        Josh Luthman
>        Office: 937-552-2340
>        Direct: 937-552-2343
>        1100 Wayne St
>        Suite 1337
>        Troy, OH 45373
>
>        Those who don't understand UNIX are condemned to reinvent it,
> poorly.
>        --- Henry Spencer
>
>        To unsubscribe from the hobbit list, send an e-mail to
>        hobbit-unsubscribe at hswn.dk
>
>
>
>
>
>
>        --
>        Tel: 0400 466 952
>        Fax: 0433 123 226
>        email: philwild AT gmail.com
>
>
>
>
> --
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>
> To unsubscribe from the hobbit list, send an e-mail to
> hobbit-unsubscribe at hswn.dk
>
>
>
>
> --
> Tel: 0400 466 952
> Fax: 0433 123 226
> email: philwild AT gmail.com
>
>
>
>
> --
> Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>



-- 
Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080521/7a740b82/attachment.html>


More information about the Xymon mailing list