sorry for the constant revision (was: re: purple haze)
Rob Munsch
rmunsch at solutionsforprogress.com
Tue Nov 1 20:40:48 CET 2005
Last email for a while, i promise; i'm chainsmoking packets at this
point. but i found this-
---
2005-11-01 14:14:20 TCP tests completed normally
2005-11-01 14:14:20 Execution of 'fping -Ae' failed with error-code 99
2005-11-01 14:14:20 Sending results for service conn
---
Okay, it can't find fping. But...
---
hobbit at randomaccess ~/server/bin $ more ../etc/hobbitserver.cfg |grep fping
# Make sure the path includes the directories where you have fping, mail
and (optionally) ntpdate installed,
FPING="/usr/sbin/fping" # Path and
options for the 'fping' program.
hobbit at randomaccess ~/server/bin $ /usr/sbin/fping -Ae brassai
10.10.10.15 is alive (0.15 ms)
hobbit at randomaccess ~/server/bin $
---
So it should be finding fping just fine, and fping is working.
The path is in hobbitserver.cfg:
---
# Make sure the path includes the directories where you have fping, mail
and (optionally) ntpdate installed,
# as well as the BBHOME/bin directory where all of the Hobbit programs
reside.
PATH="/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/hobbit/server/bin"
...
# For bbtest-net
...
FPING="/usr/sbin/fping"
# Path and options for the 'fping' program.
---
and
[bbnet]
ENVFILE /home/hobbit/server/etc/hobbitserver.cfg
------------
So, by all the above: fping is functional, it is accessible by the
'hobbit' user, it can reach the clients, it is in the PATH, it is
defined in the ENVFILE bbnet is using.
So what's gone wrong??
Rob Munsch wrote:
> Since ssh, ldap, and dns are tests run from the serverside (cpu etc
> remaining green indicates the clients are running and communicating
> OK, right?), i ran
>
> ./bbtest-net --concurrency=50 --checkresponse --no-update --timing
> --debug
>
> Now, i can ping and ssh to all clients from server just fine. But i
> see this:
>
> ---
> 2005-11-01 14:14:20 Adding to combo msg: status brassai.conn red <!--
> [flags:ordAstILe] --> Tue Nov 1 14:14:20 2005 conn NOT ok
> status brassai.conn red <!-- [flags:ordAstILe] --> Tue Nov 1 14:14:20
> 2005 conn NOT ok
>
> Service conn on brassai is not OK : Host does not respond to ping
>
> System unreachable for 3 poll periods (56 seconds)
> ---
>
> Aha. Since the ping test fails, why test other net services? So now
> it makes sense; the net tests are not being run, hence the purple.
>
> a'course, i don't know why the nettest is suddenly unable to ping
> anything. It is getting the right IPs internally:
>
> ---
> 2005-11-01 14:14:20 Got DNS result for host doisneau : 10.x.x.x
> 2005-11-01 14:14:20 Got DNS result for host brassai : 10.x.x.x
> 2005-11-01 14:14:20 Got DNS result for host moadib : 10.x.x.x
> ---
>
> and i thought cranking the concurrency way down might help, but
> apparently it doesn't.
>
> So, i'm glad i found the cause... now i just need to find out the
> cause's cause. o_O
>
--
Rob Munsch
Systems Analyst, Solutions for Progress
http://www.solutionsforprogress.com
More information about the Xymon
mailing list