[hobbit] wake up call

Josh Luthman josh at imaginenetworksllc.com
Wed May 21 03:38:07 CEST 2008


Also since you are lucky enough to have this problem at the same time
I would advise doing apacket capture with tcpdump.



On 5/20/08, Phil Wild <philwild at gmail.com> wrote:
> Can I suggest you use IP addresses for a number of servers and see if they
> survive through your next episode. That will give you an idea of where the
> problem might be...
>
> It is the least amount of work towards identifying the cause.
>
> Cheers
>
> Phil
>
> 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <katherine.hosch at navy.mil>:
>
>> Check your apache log restarts in cron....
>>
>> -----Original Message-----
>> From: Josh Luthman [mailto:josh at imaginenetworksllc.com]
>> Sent: Tuesday, May 20, 2008 10:38
>> To: hobbit at hswn.dk
>> Subject: Re: [hobbit] wake up call
>>
>> What most people suggest is having a local DNS server, on the Hobbitmon
>> server itself.
>>
>> As this is happening at the same time every single day I don't believe
>> DNS would be the cause of the issue, though it is worth taking a look at
>> until another idea comes along.
>>
>>
>> On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
>> <gleonard at progrexion.com> wrote:
>>
>>
>>        Happened again this morning.. so I am going to try a different
>> dns server.
>>
>>
>>
>>        -Gavin
>>
>>
>>
>>        From: Phil Wild [mailto:philwild at gmail.com]
>>        Sent: Monday, May 19, 2008 10:38 PM
>>        To: hobbit at hswn.dk
>>        Subject: Re: [hobbit] wake up call
>>
>>
>>
>>        Hmmm... bummer, there goes that theory... If you are using IP
>> addresses, and you are still getting failures on these hosts, then dns
>> is not involved. A ttl of five minutes is fairly worthless for a caching
>> server. It only helps if it hits the same device within five minutes, as
>> hobbit is pinging every five mins (default), you will most likely always
>> be pulling from your master/slaves...
>>
>>
>>
>>        Phil
>>
>>        2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>
>>        Well almost (good 99%) of my hosts have the testip tag, so it
>> doesn't
>>        need to look up the names.  The things it does look up are 5m
>> TTLs
>>
>>        though.
>>
>>
>>
>>        On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>>        > What is ttl set to for your domain? It would be interesting to
>> see if the
>>        > issue reduces with a higher ttl. Another way to ensure this is
>> not the area
>>        > of the issue would be to set the dns server up as a slave.
>>        >
>>        > Phil
>>        >
>>        > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>        >
>>        >> That was someone's theory in a very large post about this
>> issue in the
>>        >> past.  I did install a caching only named on the box and it
>> did not
>>        >> fix the problem.
>>        >>
>>        >> Did relieve the stress of my other DNS server though :)
>>        >>
>>        >>
>>        >>
>>        >> On 5/19/08, Phil Wild <philwild at gmail.com> wrote:
>>        >> > Hi Josh,
>>        >> >
>>        >> > This doesn't relate to the apache error, it relates to your
>> problem...
>>        >> This
>>        >> > is a theory...
>>        >> >
>>        >> > I am wondering if you are running a caching name server on
>> your hobbit
>>        >> > installation? If not, I am wondering if the fping places
>> too high a load
>>        >> on
>>        >> > your dns server and misses the occassional host. Even with
>> a caching dns
>>        >> > server you may see the issue every time ttl expires.
>>        >> >
>>        >> > Phil
>>        >> >
>>        >> > 2008/5/20 Josh Luthman <josh at imaginenetworksllc.com>:
>>        >> >
>>        >> >> Gavin,
>>        >> >>
>>        >> >> I am having a very similar issue - though it is not every
>> single day.
>>        >>  My
>>        >> >> issue is that every host (or almost all of the hosts) will
>> have
>>        >> >> conn:red
>>        >> >> and
>>        >> >> then come back up ~60s later.  I just confirmed this
>> weekend that it is
>>        >> >> not
>>        >> >> related the Via NIC (Using an Intel Pro/100 S now).
>>        >> >>
>>        >> >> An issue like that is almost always Apache related.  Can
>> you post the
>>        >> >> errors in /var/log/httpd/error_log from this time period?
>>        >> >>
>>        >> >> Josh
>>        >> >>
>>        >> >>
>>        >> >> On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
>> <gleonard at progrexion.com
>>        >> >
>>        >> >> wrote:
>>        >> >>
>>        >> >>>  Every morning at 7am I get pages from every host I
>> monitor including
>>        >> the
>>        >> >>> display server,  that its connection recovered.. the it
>> runs great for
>>        >> >>> the
>>        >> >>> next 23hrs.  looking at hobbit web page I see no down
>> time nor do the
>>        >> >>> servers show any down time.  But when I click on the
>> historical web
>>        >> link
>>        >> >>> to
>>        >> >>> see the info.. I get this.. I really love hobbit..  but I
>> am not a Web
>>        >> >>> guy
>>        >> >>> at all and I think it might be apache related...
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> *Internal Server Error*
>>        >> >>>
>>        >> >>> The server encountered an internal error or
>> misconfiguration and was
>>        >> >>> unable to complete your request.
>>        >> >>>
>>        >> >>> Please contact the server administrator, root at localhost
>> and inform
>>        >> them
>>        >> >>> of the time the error occurred, and anything you might
>> have done that
>>        >> may
>>        >> >>> have caused the error.
>>        >> >>>
>>        >> >>> More information about this error may be available in the
>> server error
>>        >> >>> log.
>>        >> >>>  ------------------------------
>>        >> >>>
>>        >> >>> *Apache/2.0.54 (Yellowdog) Server at misery.pgx.local
>> Port 80*
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> *Gavin Leonard*
>>        >> >>>
>>        >> >>> [image: cid:image001.gif at 01C856AD.922EF120]
>>        >> >>>
>>        >> >>> Director, Systems-Network Engineering
>>        >> >>>
>>        >> >>> *T*
>>        >> >>>
>>        >> >>>  801-828-1735
>>        >> >>>
>>        >> >>> *F*
>>        >> >>>
>>        >> >>>  801-828-1704
>>        >> >>>
>>        >> >>> *E*
>>        >> >>>
>>        >> >>>  gleonard at progrexion.com
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> Research | Marketing | Sales Generation
>>        >> >>>
>>        >> >>> *www.progrexion.com <http://www.progrexion.com/> *
>> <http://www.progrexion.com/>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>> This email and its contents are confidential. If you are
>> not the
>>        >> intended
>>        >> >>> recipient, delete this email and do not use or disclose
>> the
>>        >> >>> information
>>        >> >>> within this email or its attachments. Thank you.
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>>
>>        >> >>
>>        >> >>
>>        >> >>
>>        >> >> --
>>        >> >> Josh Luthman
>>        >> >> Office: 937-552-2340
>>        >> >> Direct: 937-552-2343
>>        >> >> 1100 Wayne St
>>        >> >> Suite 1337
>>        >> >> Troy, OH 45373
>>        >> >>
>>        >> >> Those who don't understand UNIX are condemned to reinvent
>> it, poorly.
>>        >> >> --- Henry Spencer
>>        >> >
>>        >> >
>>        >> >
>>        >> >
>>        >> > --
>>        >> > Tel: 0400 466 952
>>        >> > Fax: 0433 123 226
>>        >> > email: philwild AT gmail.com <http://gmail.com/>
>>        >> >
>>        >>
>>        >>
>>        >> --
>>        >> Josh Luthman
>>        >> Office: 937-552-2340
>>        >> Direct: 937-552-2343
>>        >> 1100 Wayne St
>>        >> Suite 1337
>>        >> Troy, OH 45373
>>        >>
>>        >> Those who don't understand UNIX are condemned to reinvent it,
>> poorly.
>>        >> --- Henry Spencer
>>        >>
>>        >> To unsubscribe from the hobbit list, send an e-mail to
>>        >> hobbit-unsubscribe at hswn.dk
>>        >>
>>        >>
>>        >>
>>        >
>>        >
>>        > --
>>        > Tel: 0400 466 952
>>        > Fax: 0433 123 226
>>        > email: philwild AT gmail.com <http://gmail.com/>
>>         >
>>
>>
>>
>>        --
>>
>>        Josh Luthman
>>        Office: 937-552-2340
>>        Direct: 937-552-2343
>>        1100 Wayne St
>>        Suite 1337
>>        Troy, OH 45373
>>
>>        Those who don't understand UNIX are condemned to reinvent it,
>> poorly.
>>        --- Henry Spencer
>>
>>        To unsubscribe from the hobbit list, send an e-mail to
>>        hobbit-unsubscribe at hswn.dk
>>
>>
>>
>>
>>
>>
>>        --
>>        Tel: 0400 466 952
>>        Fax: 0433 123 226
>>        email: philwild AT gmail.com
>>
>>
>>
>>
>> --
>> Josh Luthman
>> Office: 937-552-2340
>> Direct: 937-552-2343
>> 1100 Wayne St
>> Suite 1337
>> Troy, OH 45373
>>
>> Those who don't understand UNIX are condemned to reinvent it, poorly.
>> --- Henry Spencer
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
>>
>
>
> --
> Tel: 0400 466 952
> Fax: 0433 123 226
> email: philwild AT gmail.com
>


-- 
Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer



More information about the Xymon mailing list