[Xymon] hostname retrieval is broken after adding a host

Novosielski, Ryan novosirj at ca.rutgers.edu
Tue Feb 2 16:36:51 CET 2016


Possibly worth chiming in here that I use the holidays list, in case anyone is thinking of "simplifying" the code. :-)

--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>- 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Feb 2, 2016, at 09:42, J.C. Cleaver <cleaver at terabithia.org<mailto:cleaver at terabithia.org>> wrote:

On Mon, February 1, 2016 4:59 pm, John Thurston wrote:
On 2/1/2016 2:41 PM, J.C. Cleaver wrote:
Hi,

Actually, I think I must have missed your final response on this at
http://lists.xymon.com/pipermail/xymon/2015-December/042787.html ; my
apologies.

On what's happening, I think this might be a side-effect of
https://sourceforge.net/p/xymon/code/7651/ , which added a dummy record
for the purposes of command-line --test functionality when the host
doesn't exist. For an incoming unknown host (from xymond_alert's
perspective), the same path is being executed.

I've applied the patch to my non-production server and performed my
failure-reproduction steps. The behavior is certainly better. The alert
process is no longer tanking for every message received :)

What I do get, for a newly added host, is "Checking criteria for host
'foo.bar.com<http://foo.bar.com>', which is not defined. Will not alert until hostlist
reload."  This happens following all subsequent runs of xymonnet.

Is there anything which will trigger a hostlist reload?

Is there a tidy way to manually reload the list?

It doesn't seem to happen until I kill the "xymond_channel
--channel=page" process. This seems like a hamfisted thing to do after
every edit of hosts.cfg :(

Related question:

If this is in main code, and not some odd-ball null/EOF/posix problem
(as has often tripped up my Solaris systems in the recent past), why am
I the only one seeing this failure? Why aren't the folks running linux
having their alerts fail?




This one took me quite a while to figure out, mainly because I was looking
at the wrong code base for a while.

It turns out the host info record here is *only* used for display groups
and holiday lookups (probably rarely used), within the context of
alerting. In all other cases, it not being in the hostlist doesn't impact
the application of alert rule, since all the needed info is coming in via
the '@@page' message itself. The patch should be updated to let those come
straight through instead of exiting out if it doesn't see it.


My confusion came from different issue: xymond_alert actually never
reloads the hosts config at all! I found/fixed this back in Sept '14 in
the RPMs but it wasn't applied into 4.3 back then.

I'd been living with that code for so long I forgot that that reload
wasn't needed here -- and, obviously, alerts have been working *in
general*... (We only noticed the lack of reload because we were dependent
on a dynamic value in the hosts.cfg line coming through to the alert
script via XMH_RAW in updated form.)

xymond_alert reloading was put into 4.4 at
https://sourceforge.net/p/xymon/code/7776/ among the patch bursts, but the
live host add issue has probably been in since this release. There are a
few takeaways from this... but this needs to be fixed in 4.3 (among
several other incoming issues that are pending confirmation).


Can you please check the included two patches? One is an update for the
previous one, which passes the alert check through (only adding the dummy
record in --test mode to begin with), the other adds hosts.cfg reloading
on intervals or on demand. It's based on the 4.4 version, but with only a
small change. I'd like to add both, as I can't see any drawback to
reloading hosts.cfg from xymond_alert's perspective, but the first may be
sufficient to get back to the status quo.


Regards,

-jc
<localalertmode-2.patch>
<reloadalert.patch>
_______________________________________________
Xymon mailing list
Xymon at xymon.com<mailto:Xymon at xymon.com>
http://lists.xymon.com/mailman/listinfo/xymon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160202/ab4cfd92/attachment.html>


More information about the Xymon mailing list