[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] Hobbit HTTP Monitor Anomalies



I don't know about doing retries in bb-net, but it's really quite easy to
roll your own http/https tests.  A very simple case could look something
like this:
   #!/bin/bash

   curl -s -S -L -retry 3 --max-time 30 -o /dev/null
https://server.domain.com/whatever
   if [ $? -eq 0 ]; then
     COLOR=green
     MESSAGE="page fetched OK"
   else
     COLOR=red
     MESSAGE="page fetch failed"
   fi

   $BB $BBDISP "status server,domain,com.https $COLOR `date`

  $MESSAGE"

The status message needs commas in the fully qualified server name, because
the column name is separated from the server name by a dot...  The content
in $MESSAGE can include html, such as a clickable url so your ops can try to
click through to verify the server status.

I've done a bunch of these in the last 8 years...  :)

Ralph Mitchell


On Thu, Jun 18, 2009 at 10:52 AM, James Wade <jkwade (at) futurefrontiers.com>wrote:

>  We connected up a sniffer. It appears that the
>
> HTTPS request never receives a response. Hobbit is
>
> sending it out, but a reply never occurs.
>
>
>
> James
>
>
>  ------------------------------
>
> *From:* James Wade [mailto:jkwade (at) futurefrontiers.com]
> *Sent:* Thursday, June 18, 2009 9:42 AM
> *To:* hobbit (at) hswn.dk
> *Subject:* [hobbit] Hobbit HTTP Monitor Anomalies
>
>
>
> So, last weekend, we had a huge conference call because
>
> two Production HTTPS URL’s that we monitor with Hobbit were
>
> getting timeouts exceeding 30 seconds.
>
>
>
> No other URL’s on the Hobbit Server that we monitor were
>
> getting any types of timeouts.
>
>
>
> I have two Hobbit Servers, located at two physical sites, Production
>
> & Development. The Production Hobbit Server said that no timeouts
>
> occurred on the two HTTPS URL’s. However, the Development Hobbit
>
> Server (which is the primary monitoring server), indicated that the URL’s
>
> were timing out.
>
>
>
> So, the only difference appeared to be in the network. However, our
>
> network team looked at the network and showed no latency, and
>
> the placed a sniffer on the Development Hobbit Server and saw all
>
> HTTPS packets coming back without timeouts.
>
>
>
> Early this week, we placed a box that sniffs that network and can monitor
>
> for specific events. In this case, it’s connected to the same network as
>
> the Development Hobbit Server, and its monitoring the HTTPS request
>
> to the two URL’s.
>
>
>
> Last night, I received timeouts from Hobbit on the URLs, but the sniffer
>
> showed not such timeouts.
>
>
>
> I’ve checked the Hobbit server,  CPU, Load, Network, all the graphs don’t
>
> show any correlation to the timeouts.
>
>
>
> Does anyone have any ideas on this puzzle?
>
>
>
> Thanks….James
>
>
>
>
>
>
>
>
>
>
>