[Xymon] How to have a test not specifically associated with any given host?

Wed Aug 30 16:14:26 CEST 2017


On 30/8/17 17:39, Richard L. Hamilton wrote:
> Let's say I want to add a test that probes for a DHCP server or relay...and that there should be one, and only one on a given LAN (or if more than one, only authorized ones).  The test could be run on any client on the LAN, or on the xymon server for the LAN it's on.  The test results have nothing to do with the system the test runs on, save that it's on the LAN in question.
>
> (The test isn't that hard; nmap --script broadcast-dhcp-discover -e net0  (change interface name as needed) can collect the data, and the output be examined to look for "Response 1 of 1:" and check that the IP following "Server Identifier:" is as expected; if not, either the DHCP server is missing, or there's one or more that shouldn't be there, or, if there's a single response but the IP is wrong, worst of all, there's one DHCP server, but it's bogus or misconfigured.)
>
> Do I make up a pseudo-hostname (e.g. give each LAN a name of its own), put that in hosts.cfg with a reserved-as-bogus address (can't use 0.0.0.0 if that will cause lookup failures), maybe mark it noconn (since the IP should not be pinged) and multihomed (since the IP is bogus, and the IP that the report comes from may be different), and with the test name?  And then make the test script report as if the hostname was the made-up LAN name?  And of course NOT use any tests handled by xymonnet for that entry?
Consider the hostname field as a label. It doesn't have to be a 
hostname. Equally, the IP address doesn't have to be a valid IP, as you 
know we can use the value 0.0.0.0, which is not a valid IP. In your 
case, some other IP might be valid, such as 192.168.1.255 to indicate 
the broadcast address, which is in fact what you are monitoring, but in 
your case, xymon will never care what the IP or hostname is.
I would suggest 0.0.0.0 with noconn to prevent xymonnet doing anything 
with it. Keep the label "hostname" syntax, but it doesn't need to be in DNS.
> I suppose that would work, and if I put those pseudo-hosts in a separate page or section, would only have the DHCPprobe test (or similar tests) column for them.
It depends, personally I would put all the items relevant to the network 
on the same page. Unless you have a DHCP "team" which is separate from 
each other team managing the other services/devices for the network. 
Using the groups, you can keep the number of columns for different 
groups of devices/services limited, so you don't have super wide tables 
with lots of empty spaces.
> But is that (a) a reasonable thing to do, or just plain crazy, and (b) is that a reasonable abuse (if that's not an oxymoron) of the facilities xymon provides, or is there some better way to do that sort of thing (e.g. a test that's per-LAN or at any rate doesn't really relate to a single system as such)?
>
> Keep in mind that putting in entries specifically for the DHCP servers/relays themselves may not be helpful insofar the test is not truly just for them (why xymonnet can't do it), that one might want to have a variation to allow more than one legitimate DHCP server per LAN (redundant, perhaps with only static allocation tied to MAC addresses so they don't have to coordinate allocation from a dynamic address range), etc.  Not to mention that a DHCP server may be colocated on a router or other infrastructure box, which one might either not want to report at all, or report separately, and which might not support a client extension script, thus requiring the script to run elsewhere anyway.
I don't really understand the issue here. Generally, you would configure 
one (or more) servers (computer/router/device) to act as the DHCP 
server. So you know which device should be running DHCP, why not report 
that here? If the DHCP service stops responding, how will you know where 
it is supposed to live, and hence which device you should start to look 
at? You might start on the router, and decide that DHCP config was lost, 
so you set it up again, then the next day someone reboots the actual 
DHCP server, and you now have two that are probably conflicting. If you 
can see DHCP is supposed to be on machine abx but it is down, then you 
can decide to either fix abx to get DHCP working, or worry about fixing 
abx later and configure DHCP on the router in the meantime.
> My case, with a single home LAN, is pretty trivial; but if there were more LANs, the script might end up hard-coding LAN names to use individually (each version of script different), unless it parsed ifconfig output to see what LAN it was on, and then consulted an entire table of LAN addresses (either hardcoded or retrieved from the server) to see what name to use for the LAN.  So writing the script to be scalable to multiple LANs without needing each instance tweaked would be harder.
You can store the needed information in xymon if you want, but yes, you 
will need to run the script from multiple hosts (since the host running 
the script needs to be on the local lan you are testing), and you would 
be best to have a config file/source for the actual test data. A simple 
option would be a second config file which holds the specific test 
details for the local device, another would be to read the needed config 
data from the local device OS config files (eg /etc/network/interfaces), 
or finally, retrieve the data from xymon.
> Could xymon handle (pseudo-) hostnames that looked like LAN-192.168.0, or would it choke on those?
I don't see an issue, that would be a valid hostname (check the rfc for 
all the details).
> Names like that would require no bother for how to make them up (no table needed, could be generated from the interface IP and net mask), and would be informative...although it wouldn't work well for net masks that were other than /8, /16, or /24.
Why not? What is wrong with LAN-192.168.128 or LAN-10.12.64.0 ? Of 
course, you don't know if 10.12.64.0 is a /24 or a /23 or /22 etc.. in 
fact LAN-10.0.0.0 could be anything from a /30 to something less than 
/8. Either you make an assumption, or you will need to specify the 
netmask, or you need to collect the netmask from the local interface.
>   And how the heck could one generate acceptable IPv6 LAN names that were similarly useful and digestible? :-)
I would suggest something like CompanyA.AU.NSW.Sydney.LAN, to me, that 
would be more informative. Clicking on the status would then show the 
config data including LAN ip address, and the test results.
> Even with auto-generated LAN names, there would still need to be a table of legitimate DHCP server(s) per LAN, so I suppose that would best be retrieved from the server.
Or whatever central configuration system you use.... assuming you do use 
a central system to manage the configuration of all your hosts ;)
> Have I finally come up with an unusual enough idea to make someone's head hurt? :-)
>
Not really, it's a good test to have, but I don't see why the test 
result isn't just a column on the host that it currently configured to 
provide the service.
Personally, I'd also add a procs test for the process name, a ports test 
for the port, and also if you wanted, a local check to see how many IP's 
are allocated/available/total, so you can see that /24 was allowed for 
DHCP, but you have allocated 217, and perhaps you want to alert because 
you are "close" to exhausting the available allocation.

Regards,
Adam