[Xymon] alert/hostname loading (was Re: xymon hostdata module going rogue)

John Thurston john.thurston at alaska.gov
Tue Dec 1 22:03:03 CET 2015


On 12/1/2015 11:48 AM, J.C. Cleaver wrote:
- snip -
>
> Hmm. This seems to be fundamentally a different issue than the "hostdata
> module going rogue" thing, which was about zombies never being picked up.
>
> AFAICT, somehow the hosts tree structure is getting clobbered as a result
> of the drop (assuming all of those hosts are expected to be existing).

See my later message for its relation to 'drop' activity.

> There were a few patches for things in xymond.c at one point, and more
> error checking when going to POSIX btrees generally, but I hadn't
> encountered this in other intermittent hostlist readers.
>
> 1) Which version of Solaris is this?

Solaris 10, most recent update, SPARC

> 2) Have you experienced this in other workers for xymon? (IE,
> xymond_client not being able to look up hostnames after a drop -- would
> probably lead to random purples)

I haven't seen behavior like that with other worker processes.
Is there a way to interactively run a worker process and have it hit the 
daemon process for the hostnames?
Aside from making the process dump core, is there a way to get the 
daemon to spill its current list of hostnames?

> 3) Does issuing a "reload" command or -HUP to xymond_alert re-sync things?

I didn't do a 'reload', but I killed the "xymond_channel --channel=page 
--log=/var/log/xymon/alert.log xymond_alert" process and alerts started 
working again.

I haven't yet found a way to induce this failure, so I haven't yet 
identified the minimal recovery steps. I'm working on it, though.
-- 
    Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska



More information about the Xymon mailing list