[hobbit] wake up call

Phil Wild philwild at gmail.com
Thu May 22 03:36:16 CEST 2008


It sure sounds like your issue is with your dns servers...

There are another couple of things to try...

You can set --dns=ip for bb-testnet This will tell hobbit to use the IP's
specified in your bb-hosts file rather than passing it to the OS name
resolution libraries.

I would expect you will get the same result as you have now with all IP's
defined in /etc/hosts. It would be very interesting to know why this happens
the same time every day. Can you describe your network and dns topology?
What settings do you have in your soa?

Cheers

Phil

2008/5/22 Josh Luthman <josh at imaginenetworksllc.com>:

> Tell me what email they're coming from and use josh at imaginenetworksllc.com
>
>
> On Wed, May 21, 2008 at 12:12 PM, Gavin Leonard <gleonard at progrexion.com>
> wrote:
>
>>  Sure.. just give me your pager # and they can wake you up… J
>>
>>
>>
>> -Gavin
>>
>>
>>
>> *From:* Josh Luthman [mailto:josh at imaginenetworksllc.com]
>> *Sent:* Wednesday, May 21, 2008 10:07 AM
>>
>> *To:* hobbit at hswn.dk
>> *Subject:* Re: [hobbit] wake up call
>>
>>
>>
>> After those three mornings would mind commenting those hosts to be certain
>> that reproduces the issue?
>>
>> On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <gleonard at progrexion.com>
>> wrote:
>>
>> Ok.. well it did not do it this morning after adding all of my monitored
>> hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
>> my /etc/hosts file, modified in to proper format.. no pages this morning..
>> so it could have been a dns issue.. if I am clear for three more mornings
>> then I will be satisfied… I will let you know..
>>
>>
>>
>> -Gavin
>>
>>
>>
>> *From:* Josh Luthman [mailto:josh at imaginenetworksllc.com]
>> *Sent:* Tuesday, May 20, 2008 10:24 PM
>>
>>
>> *To:* hobbit at hswn.dk
>> *Subject:* Re: [hobbit] wake up call
>>
>>
>>
>> Thanks for the heads up.  I am very interested in knowing what is the
>> cause and more importantly the solution to your issue, as it may fix mine!
>>
>> It would VERY nice to be able to print out uptime and availability reports
>> without the dozens of 1 minute outages.  I know my issue is related to the
>> box itself (hardware or software) as the issue appears on the hobbit server
>> itself.
>>
>> On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <gleonard at progrexion.com>
>> wrote:
>>
>> Most if not all of my servers are defined by ip anyway, I have a very
>> segmented network so dns is not very helpful across all the different
>> domains and subnets.. i use my hosts file for the most part.. now that I
>> think of it, I wonder if the ones in the host file are still ok?  I will let
>> you know…
>>
>>
>>
>> -Gavin
>>
>>
>>
>> *From:* Phil Wild [mailto:philwild at gmail.com]
>> *Sent:* Tuesday, May 20, 2008 7:12 PM
>>
>>
>> *To:* hobbit at hswn.dk
>> *Subject:* Re: [hobbit] wake up call
>>
>>
>>
>> Can I suggest you use IP addresses for a number of servers and see if they
>> survive through your next episode. That will give you an idea of where the
>> problem might be...
>>
>>
>>
>> It is the least amount of work towards identifying the cause.
>>
>>
>>
>> Cheers
>>
>>
>>
>> Phil
>>
>> 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <katherine.hosch at navy.mil>:
>>
>> Check your apache log restarts in cron....
>>
>>
>> -----Original Message-----
>> From: Josh Luthman [mailto:josh at imaginenetworksllc.com]
>> Sent: Tuesday, May 20, 2008 10:38
>> To: hobbit at hswn.dk
>> Subject: Re: [hobbit] wake up call
>>
>> What most people suggest is having a local DNS server, on the Hobbitmon
>> server itself.
>>
>> As this is happening at the same time every single day I don't believe
>> DNS would be the cause of the issue, though it is worth taking a look at
>> until another idea comes along.
>>
>>
>> On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
>> <gleonard at progrexion.com> wrote:
>>
>>
>>        Happened again this morning.. so I am going to try a different
>> dns server.
>>
>>
>>
>>        -Gavin
>>
>>
>>
>>        From: Phil Wild [mailto:philwild at gmail.com]
>>        Sent: Monday, May 19, 2008 10:38 PM
>>        To: hobbit at hswn.dk
>>        Subject: Re: [hobbit] wake up call
>>
>>
>>
>>        Hmmm... bummer, there goes that theory... If you are using IP
>> addresses, and you are still getting failures on these hosts, then dns
>> is not involved. A ttl of five minutes is fairly worthless for a caching
>> server. It only helps if it hits the same device within five minutes, as
>> hobbit is pinging every five mins (default), you will most likely always
>> be pulling from your master/slaves...
>>
>>
>>
>>        Phil
>>
>>        2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>
>>        Well almost (good 99%) of my hosts have the testip tag, so it
>> doesn't
>>        need to look up the names.  The things it does look up are 5m
>> TTLs
>>
>>        though.
>>
>>
>>
>>        On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>>        > What is ttl set to for your domain? It would be interesting to
>> see if the
>>        > issue reduces with a higher ttl. Another way to ensure this is
>> not the area
>>        > of the issue would be to set the dns server up as a slave.
>>        >
>>        > Phil
>>        >
>>        > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>        >
>>        >> That was someone's theory in a very large post about this
>> issue in the
>>        >> past.  I did install a caching only named on the box and it
>> did not
>>        >> fix the problem.
>>        >>
>>        >> Did relieve the stress of my other DNS server though :)
>>        >>
>>        >>
>>        >>
>>        >> On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>>        >> > Hi Josh,
>>        >> >
>>        >> > This doesn't relate to the apache error, it relates to your
>> problem...
>>        >> This
>>        >> > is a theory...
>>        >> >
>>        >> > I am wondering if you are running a caching name server on
>> your hobbit
>>        >> > installation? If not, I am wondering if the fping places
>> too high a load
>>        >> on
>>        >> > your dns server and misses the occassional host. Even with
>> a caching dns
>>        >> > server you may see the issue every time ttl expires.
>>        >> >
>>        >> > Phil
>>        >> >
>>        >> > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>        >> >
>>        >> >> Gavin,
>>        >> >>
>>        >> >> I am having a very similar issue - though it is not every
>> single day.
>>        >>  My
>>        >> >> issue is that every host (or almost all of the hosts) will
>> have
>>        >> >> conn:red
>>        >> >> and
>>        >> >> then come back up ~60s later.  I just confirmed this
>> weekend that it is
>>        >> >> not
>>        >> >> related the Via NIC (Using an Intel Pro/100 S now).
>>        >> >>
>>        >> >> An issue like that is almost always Apache related.  Can
>> you post the
>>        >> >> errors in /var/log/httpd/error_log from this time period?
>>        >> >>
>>        >> >> Josh
>>        >> >>
>>        >> >>
>>        >> >> On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
>> <gleonard at progrexion.com
>>        >> >
>>        >> >> wrote:
>>        >> >>
>>        >> >>>  Every morning at 7am I get pages from every host I
>> monitor including
>>        >> the
>>        >> >>> display server,  that its connection recovered.. the it
>> runs great for
>>        >> >>> the
>>        >> >>> next 23hrs.  looking at hobbit web page I see no down
>> time nor do the
>>        >> >>> servers show any down time.  But when I click on the
>> historical web
>>        >> link
>>        >> >>> to
>>        >> >>> see the info.. I get this.. I really love hobbit..  but I
>> am not a Web
>>        >> >>> guy
>>        >> >>> at all and I think it might be apache related...
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> *Internal Server Error*
>>        >> >>>
>>        >> >>> The server encountered an internal error or
>> misconfiguration and was
>>        >> >>> unable to complete your request.
>>        >> >>>
>>        >> >>> Please contact the server administrator, root at localhost
>> and inform
>>        >> them
>>        >> >>> of the time the error occurred, and anything you might
>> have done that
>>        >> may
>>        >> >>> have caused the error.
>>        >> >>>
>>        >> >>> More information about this error may be available in the
>> server error
>>        >> >>> log.
>>        >> >>>  ------------------------------
>>        >> >>>
>>        >> >>> *Apache/2.0.54 (Yellowdog) Server at misery.pgx.local
>> Port 80*
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> *Gavin Leonard*
>>        >> >>>
>>        >> >>> [image: cid:image001.gif at 01C856AD.922EF120]
>>        >> >>>
>>        >> >>> Director, Systems-Network Engineering
>>        >> >>>
>>        >> >>> *T*
>>        >> >>>
>>        >> >>>  801-828-1735
>>        >> >>>
>>        >> >>> *F*
>>        >> >>>
>>        >> >>>  801-828-1704
>>        >> >>>
>>        >> >>> *E*
>>        >> >>>
>>        >> >>>  gleonard at progrexion.com
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> Research | Marketing | Sales Generation
>>        >> >>>
>>
>>        >> >>> *www.progrexion.com <http://www.progrexion.com/> *
>>
>> <http://www.progrexion.com/>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> This email and its contents are confidential. If you are
>> not the
>>        >> intended
>>        >> >>> recipient, delete this email and do not use or disclose
>> the
>>        >> >>> information
>>        >> >>> within this email or its attachments. Thank you.
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>
>>        >> >>
>>        >> >>
>>        >> >> --
>>        >> >> Josh Luthman
>>        >> >> Office: 937-552-2340
>>        >> >> Direct: 937-552-2343
>>        >> >> 1100 Wayne St
>>        >> >> Suite 1337
>>        >> >> Troy, OH 45373
>>        >> >>
>>        >> >> Those who don't understand UNIX are condemned to reinvent
>> it, poorly.
>>        >> >> --- Henry Spencer
>>        >> >
>>        >> >
>>        >> >
>>        >> >
>>        >> > --
>>        >> > Tel: 0400 466 952
>>        >> > Fax: 0433 123 226
>>
>>        >> > email: philwild AT gmail.com <http://gmail.com/>
>>
>>        >> >
>>        >>
>>        >>
>>        >> --
>>        >> Josh Luthman
>>        >> Office: 937-552-2340
>>        >> Direct: 937-552-2343
>>        >> 1100 Wayne St
>>        >> Suite 1337
>>        >> Troy, OH 45373
>>        >>
>>        >> Those who don't understand UNIX are condemned to reinvent it,
>> poorly.
>>        >> --- Henry Spencer
>>        >>
>>        >> To unsubscribe from the hobbit list, send an e-mail to
>>        >> hobbit-unsubscribe at hswn.dk
>>        >>
>>        >>
>>        >>
>>        >
>>        >
>>        > --
>>        > Tel: 0400 466 952
>>        > Fax: 0433 123 226
>>
>>        > email: philwild AT gmail.com <http://gmail.com/>
>>
>>        >
>>
>>
>>
>>        --
>>
>>        Josh Luthman
>>        Office: 937-552-2340
>>        Direct: 937-552-2343
>>        1100 Wayne St
>>        Suite 1337
>>        Troy, OH 45373
>>
>>        Those who don't understand UNIX are condemned to reinvent it,
>> poorly.
>>        --- Henry Spencer
>>
>>        To unsubscribe from the hobbit list, send an e-mail to
>>        hobbit-unsubscribe at hswn.dk
>>
>>
>>
>>
>>
>>
>>        --
>>        Tel: 0400 466 952
>>        Fax: 0433 123 226
>>        email: philwild AT gmail.com
>>
>>
>>
>>
>> --
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>>
>> Those who don't understand UNIX are condemned to reinvent it, poorly.
>> --- Henry Spencer
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>>
>> --
>> Tel: 0400 466 952
>> Fax: 0433 123 226
>> email: philwild AT gmail.com
>>
>>
>>
>>
>> --
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>>
>> Those who don't understand UNIX are condemned to reinvent it, poorly.
>> --- Henry Spencer
>>
>>
>>
>>
>> --
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>>
>> Those who don't understand UNIX are condemned to reinvent it, poorly.
>> --- Henry Spencer
>>
>
>
>
> --
>  Josh Luthman
> Office: 937-552-2340
> Direct: 937-552-2343
> 1100 Wayne St
> Suite 1337
> Troy, OH 45373
>
> Those who don't understand UNIX are condemned to reinvent it, poorly.
> --- Henry Spencer
>



-- 
Tel: 0400 466 952
Fax: 0433 123 226
email: philwild AT gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20080522/7e81d6dc/attachment.html>


More information about the Xymon mailing list