[Xymon] Xymon Digest, Vol 62, Issue 8

Agege Information Systems, Inc. cs at agege.com
Thu Mar 10 14:34:23 CET 2016


Greetings,  Is there anyway to get an Xymon alert on Windows server that has been running for over 120days.

-Agege
> On Mar 10, 2016, at 5:00 AM, xymon-request at xymon.com wrote:
> 
> Send Xymon mailing list submissions to
> 	xymon at xymon.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.xymon.com/mailman/listinfo/xymon
> or, via email, send a message with subject or body 'help' to
> 	xymon-request at xymon.com
> 
> You can reach the person managing the list at
> 	xymon-owner at xymon.com
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Xymon digest..."
> 
> 
> Today's Topics:
> 
>   1. PORTS and STATE syntax (Boldt, David)
>   2. Re: jmxstat (Andy Smith)
>   3. The testip option does not seem to be honored by the http
>      test (Shawn Heisey)
>   4. Re: The testip option does not seem to be honored by the http
>      test (John Thurston)
>   5. How to change the refresh time for acknowledgements (john boris)
>   6. Re: How to change the refresh time for acknowledgements
>      (Ryan Novosielski)
>   7. Re: How to change the refresh time for acknowledgements
>      (Ryan Novosielski)
>   8. Re: How to change the refresh time for acknowledgements
>      (john boris)
>   9. Re: jmxstat (Galen Johnson)
>  10. Always purple history after time shift on server - how to	fix
>      (Andrey Chervonets)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 9 Mar 2016 09:09:16 -0500
> From: "Boldt, David" <dboldt at usgs.gov>
> To: <xymon at xymon.com>
> Subject: [Xymon] PORTS and STATE syntax
> Message-ID:
> 	<CAC_ry0YJfGtLA9sHeqLH6RnwRfMYeKbq+3S7EnS5AEBVsfcFng at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> I'm not successful filtering on the connection state associated with a port.
> None of the syntax variations I have tried have been successful.
> If I remove the STATE specifier, matches are found.
> 
> There are multiple hosts connecting to the same port:
> 
> ESTAB      0      0              10.160.8.130:61617         10.160.8.132:57765
> ESTAB      0      0              10.160.8.130:61617         10.160.8.132:57766
> ESTAB      0      0              10.160.8.130:61617         10.160.8.132:57768
> ESTAB      0      0              10.160.8.130:61617         10.160.8.133:45096
> ESTAB      0      0              10.160.8.130:61617         10.160.8.133:45104
> ESTAB      0      0              10.160.8.130:61617         10.160.8.133:45107
> ESTAB      0      0              10.160.8.130:61617          130.118.4.2:36141
> ESTAB      0      0              10.160.8.130:61617          130.118.4.2:36150
> ESTAB      0      0              10.160.8.130:61617          130.118.4.2:36151
> ESTAB      0      0              10.160.8.130:61617         136.177.16.3:34320
> ESTAB      0      0              10.160.8.130:61617         136.177.16.3:34321
> ESTAB      0      0              10.160.8.130:61617         136.177.16.3:34324
> ESTAB      0      0              10.160.8.130:61617       137.227.240.32:50726
> ESTAB      0      0              10.160.8.130:61617       137.227.240.32:50727
> ESTAB      0      0              10.160.8.130:61617       137.227.240.32:50729
> LISTEN     0      0                         *:61617                    *:*
> 
> I've set up several port monitoring specifications, but none of them
> match the state (the first example where no state is specified
> succeeds):
> 
> PORT LOCAL=%[:](61617) REMOTE=%10.160.8.132   MIN=3 MAX=3 COLOR=yellow
> TEXT=ActiveMQ-DHCP
> PORT LOCAL=%[:](61617) REMOTE=%10.160.8.133   STATE=ESTABLISHED MIN=3
> MAX=3 COLOR=yellow TEXT=ActiveMQ-nsp.er
> PORT LOCAL=%[:](61617) REMOTE=%136.177.16.3   STATE=ESTAB MIN=3 MAX=3
> COLOR=yellow TEXT=ActiveMQ-ns.cr
> PORT LOCAL=%[:](61617) REMOTE=%137.227.240.32 STATE=%ESTAB MIN=3 MAX=3
> COLOR=yellow TEXT=ActiveMQ-ns.er
> PORT LOCAL=%[:](61617) REMOTE=%130.118.4.2    STATE=%ESTAB* MIN=3
> MAX=3 COLOR=yellow TEXT=ActiveMQ-ns.wr
> 
> Note: On this server netstat does not exist and ss is being used,.
> 
> 
> Observation: Discovering the syntax for REMOTE was trial and error.
> Specifying the IP address alone did not work, and I found no examples
> for the type of filtering above.
> 
> -- 
>                                         -- David Boldt
>                                            <dboldt at usgs.gov>
> 
> 
>   "Discovery consists of seeing what everybody has seen and thinking
> what nobody has thought."
>    --Albert Szent-Gyorgyi (1893 - 1986)
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 9 Mar 2016 16:36:44 +0000
> From: Andy Smith <abs at shadymint.com>
> To: Galen Johnson <Galen.Johnson at sas.com>, "xymon at xymon.com"
> 	<xymon at xymon.com>
> Subject: Re: [Xymon] jmxstat
> Message-ID:
> 	<CAFz9LfvL-9XVo5BOui2oDy4XXqRPmajP-yhAyfEfrObzxB6Y=g at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> ---------- Forwarded message ----------
>> From: Galen Johnson <Galen.Johnson at sas.com>
>> Date: 9 March 2016 at 00:26
>> Subject: [Possible Spam] [Xymon] jmxstat
>> To: "xymon at xymon.com" <xymon at xymon.com>
>> 
>> Hey,
>> 
>> I don't think this is specific to jmxstat but I'm trying to implement.  I'm getting the > following in apache log when I hit the page:
>> 
>> Setup error: Service GCInfo has a graph GCInfo, but no graph-definition,...
>> 
>> However, the definition does exist and I can view the graphs if I select a different > > service and change the name in the URL.  RRDs are being created.
>> 
>> Anyone else run into this and overcome it?
>> 
>> thanks
> 
> Not seen that before, but just check for me please, is GCInfo mentioned in
> both TEST2RRD and GRAPHS in xymonserver.cfg?
> --
> Andy
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 9 Mar 2016 10:53:08 -0700
> From: Shawn Heisey <hobbit at elyograg.org>
> To: xymon at xymon.com
> Subject: [Xymon] The testip option does not seem to be honored by the
> 	http	test
> Message-ID: <56E06304.6040301 at elyograg.org>
> Content-Type: text/plain; charset=utf-8
> 
> I have the following in my hosts.cfg file:
> 
> 10.100.2.131    fourqueens.REDACTED.com # testip ssh
> mgmt=10.2.6.131[http,https,ssh] https://megaagency.REDACTED.com
> delayred=http:10
> 10.100.2.132    fitzgeralds.REDACTED.com # testip ssh
> mgmt=10.2.6.132[http,https,ssh] https://megaagency.REDACTED.com
> delayred=http:10
> 
> The "mgmt" option controls a custom server-side script we wrote that
> verifies reachability of the out-of-band server management (Dell DRAC in
> this case).
> 
> I had expected the "testip" option to force the https URL test to be
> sent directly to the server, not the DNS address (which is a load
> balancer), but I can see the load balancer cookie in the response and
> requests in the load balancer's log.
> 
> Is there any way to get the intended behavior?
> 
> The server is running 4.3.23 with the patch to fix http response code
> interpretation.
> 
> Thanks,
> Shawn
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 09 Mar 2016 09:00:09 -0900
> From: John Thurston <john.thurston at alaska.gov>
> To: xymon at xymon.com
> Subject: Re: [Xymon] The testip option does not seem to be honored by
> 	the http test
> Message-ID: <56E064A9.300 at alaska.gov>
> Content-Type: text/plain; CHARSET=US-ASCII; format=flowed
> 
> On 3/9/2016 8:53 AM, Shawn Heisey wrote:
> - snip -
>> I had expected the "testip" option to force the https URL test to be
>> sent directly to the server, not the DNS address (which is a load
>> balancer), but I can see the load balancer cookie in the response and
>> requests in the load balancer's log.
> 
> The TESTIP controls the behavior of the CONN test. To make the HTTP test 
> use an IP address instead of resolving the name, use the following syntax:
>   http://www.sample.com=1.2.3.4/index.html
> 
> From the hosts.cfg man page:
> 
>> Testing sites by IP-address
>>    xymonnet ignores the "testip" tag normally used to force a test to use the IP-address from the hosts.cfg file instead of the hostname, when it performs http and https tests.
>>    The reason for this is that it interacts badly with virtual hosts, especially if these are IP-based as is common with https-websites.
>>    Instead the IP-address to connect to can be overridden by specifying it as:
>>            http://www.sample.com=1.2.3.4/index.html
>>    The "=1.2.3.4" will case xymonnet to run the test against the IP-address "1.2.3.4", but still trying to access a virtual website with the name "www.sample.com".
>>    The "=ip.address.of.host" must be the last part of the hostname, so if you need to combine this with e.g. an explicit port number, it should be done as
>>            http://www.sample.com:3128=1.2.3.4/index.html
> 
> 
> -- 
>    Do things because you should, not just because you can.
> 
> John Thurston    907-465-8591
> John.Thurston at alaska.gov
> Enterprise Technology Services
> Department of Administration
> State of Alaska
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Wed, 9 Mar 2016 13:03:13 -0500
> From: john boris <jborissr at gmail.com>
> To: xymon at xymon.com
> Subject: [Xymon] How to change the refresh time for acknowledgements
> Message-ID:
> 	<CAOk1TCyYkPqGEVNE=6eXDfOwpxd=_0_AneWivxn0WBjgaXdO0w at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> How can you change the response time when you acknowledge an issue so that
> it shows up as a check and you don't get pinged repeatedly. It looks like
> it takes about 5 minutes for the acknowledgement to take place.
> 
> -- 
> John J. Boris, Sr.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160309/30036eef/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 6
> Date: Wed, 9 Mar 2016 13:15:15 -0500
> From: Ryan Novosielski <novosirj at rutgers.edu>
> To: Xymon Mailing List <xymon at xymon.com>
> Subject: Re: [Xymon] How to change the refresh time for
> 	acknowledgements
> Message-ID: <1A609984-A1EA-4BE8-9D1F-0CE2ADC3CAE8 at rutgers.edu>
> Content-Type: text/plain; charset="utf-8"
> 
>> On Mar 9, 2016, at 1:03 PM, john boris <jborissr at gmail.com> wrote:
>> 
>> How can you change the response time when you acknowledge an issue so that it shows up as a check and you don't get pinged repeatedly. It looks like it takes about 5 minutes for the acknowledgement to take place.
> 
> I’m pretty sure that acknowledgement is more-or-less immediate, and what you’re talking about is the delay before the display is updated.
> 
> --
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS      |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>    `'
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 204 bytes
> Desc: Message signed with OpenPGP using GPGMail
> URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160309/2b448efa/attachment-0001.sig>
> 
> ------------------------------
> 
> Message: 7
> Date: Wed, 9 Mar 2016 13:25:54 -0500
> From: Ryan Novosielski <novosirj at rutgers.edu>
> To: xymon at xymon.com
> Subject: Re: [Xymon] How to change the refresh time for
> 	acknowledgements
> Message-ID: <35311D00-8B33-45D2-8A1C-6FCB7BC2D3D0 at rutgers.edu>
> Content-Type: text/plain; charset="utf-8"
> 
> John,
> 
> Please keep replies “on list.”
> 
> I don’t know that there’s much that you can do about that as there is a sweep that goes on, each run of the xymonnet process that runs at whatever interval you’ve selected (the default being 5 mins). I think if you acknowledge and catch that process in the middle, you’re going to get the messages from that run. They might also even have been received already by your e-mail system, before you acknowledged it.
> 
> Someone else will know more than I do about the particulars here.
> 
>> On Mar 9, 2016, at 1:23 PM, john boris <jborissr at gmail.com> wrote:
>> 
>> Ryan,
>> I understand the web page takes some time to get changed but I have acknowledge issues and still receive notifications for a a few minutes after I acknowledge it. I will have to check the time the next time I do this just to be sure of the lag time.
>> 
>> On Wed, Mar 9, 2016 at 1:15 PM, Ryan Novosielski <novosirj at rutgers.edu> wrote:
>>> On Mar 9, 2016, at 1:03 PM, john boris <jborissr at gmail.com> wrote:
>>> 
>>> How can you change the response time when you acknowledge an issue so that it shows up as a check and you don't get pinged repeatedly. It looks like it takes about 5 minutes for the acknowledgement to take place.
>> 
>> I’m pretty sure that acknowledgement is more-or-less immediate, and what you’re talking about is the delay before the display is updated.
> 
> --
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS      |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>     `'
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 204 bytes
> Desc: Message signed with OpenPGP using GPGMail
> URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160309/c28b78c2/attachment-0001.sig>
> 
> ------------------------------
> 
> Message: 8
> Date: Wed, 9 Mar 2016 13:34:00 -0500
> From: john boris <jborissr at gmail.com>
> To: Ryan Novosielski <novosirj at rutgers.edu>
> Cc: xymon at xymon.com
> Subject: Re: [Xymon] How to change the refresh time for
> 	acknowledgements
> Message-ID:
> 	<CAOk1TCwRzkaSdMJsFEMd3Uou35s_gbjgW_vJKhoeqP_SmOo3PA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Ryan,
> I thought my reply was going to the list but as I now see lovely gmail hid
> that from me.
> 
> On Wed, Mar 9, 2016 at 1:25 PM, Ryan Novosielski <novosirj at rutgers.edu>
> wrote:
> 
>> John,
>> 
>> Please keep replies “on list.”
>> 
>> I don’t know that there’s much that you can do about that as there is a
>> sweep that goes on, each run of the xymonnet process that runs at whatever
>> interval you’ve selected (the default being 5 mins). I think if you
>> acknowledge and catch that process in the middle, you’re going to get the
>> messages from that run. They might also even have been received already by
>> your e-mail system, before you acknowledged it.
>> 
>> Someone else will know more than I do about the particulars here.
>> 
>>> On Mar 9, 2016, at 1:23 PM, john boris <jborissr at gmail.com> wrote:
>>> 
>>> Ryan,
>>> I understand the web page takes some time to get changed but I have
>> acknowledge issues and still receive notifications for a a few minutes
>> after I acknowledge it. I will have to check the time the next time I do
>> this just to be sure of the lag time.
>>> 
>>> On Wed, Mar 9, 2016 at 1:15 PM, Ryan Novosielski <novosirj at rutgers.edu>
>> wrote:
>>>> On Mar 9, 2016, at 1:03 PM, john boris <jborissr at gmail.com> wrote:
>>>> 
>>>> How can you change the response time when you acknowledge an issue so
>> that it shows up as a check and you don't get pinged repeatedly. It looks
>> like it takes about 5 minutes for the acknowledgement to take place.
>>> 
>>> I’m pretty sure that acknowledgement is more-or-less immediate, and what
>> you’re talking about is the delay before the display is updated.
>> 
>> --
>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>> || \\UTGERS      |---------------------*O*---------------------
>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>> || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>     `'
>> 
>> 
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
>> 
>> 
> 
> 
> -- 
> John J. Boris, Sr.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160309/bd79d27e/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 9
> Date: Wed, 9 Mar 2016 20:21:13 +0000
> From: Galen Johnson <Galen.Johnson at sas.com>
> To: Andy Smith <abs at shadymint.com>, "xymon at xymon.com"
> 	<xymon at xymon.com>
> Subject: Re: [Xymon] jmxstat
> Message-ID: <1457554873838.75260 at sas.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> It is in both places. 
> 
> =G=
> 
> ________________________________________
> From: hastymind at googlemail.com <hastymind at googlemail.com> on behalf of Andy Smith <abs at shadymint.com>
> Sent: Wednesday, March 9, 2016 11:36 AM
> To: Galen Johnson; xymon at xymon.com
> Subject: Re: [Xymon] jmxstat
> 
> ---------- Forwarded message ----------
>> From: Galen Johnson <Galen.Johnson at sas.com>
>> Date: 9 March 2016 at 00:26
>> Subject: [Possible Spam] [Xymon] jmxstat
>> To: "xymon at xymon.com" <xymon at xymon.com>
>> 
>> Hey,
>> 
>> I don't think this is specific to jmxstat but I'm trying to implement.  I'm getting the > following in apache log when I hit the page:
>> 
>> Setup error: Service GCInfo has a graph GCInfo, but no graph-definition,...
>> 
>> However, the definition does exist and I can view the graphs if I select a different > > service and change the name in the URL.  RRDs are being created.
>> 
>> Anyone else run into this and overcome it?
>> 
>> thanks
> 
> Not seen that before, but just check for me please, is GCInfo mentioned in
> both TEST2RRD and GRAPHS in xymonserver.cfg?
> --
> Andy
> 
> 
> ------------------------------
> 
> Message: 10
> Date: Thu, 10 Mar 2016 11:44:32 +0200
> From: Andrey Chervonets <A.Chervonets at cominder.eu>
> To: xymon at xymon.com
> Subject: [Xymon] Always purple history after time shift on server -
> 	how to	fix
> Message-ID:
> 	<OFD18FC1E3.6E49DB3D-ONC2257F72.002FE06C-C2257F72.0035834B at cominder.eu>
> 	
> Content-Type: text/plain; charset="utf-8"
> 
> I would like to share some hints in resolving history reporting problem 
> after big time shift on monitoring server - about 4 hours.
> May be it will help anyone else.
> 
> It was some month ago, but I have found time to fix it only today.
> What happened:
> 1. Time on monitoring host increased for 4 hours.
> 2. As result - all metrics reported Purple status (it is intended 
> functionality, but would be nice XyMon detect big time shift and adopt 
> reporting in some way)
> 3. It was problem at virtual host provider, I had reported the problem and 
> time was fixed back to correct value
> 4. To fix current reporting I had cleaned some files under xymon/logs or 
> acks (really I do not remember which ones right now) - this has reset last 
> status duration information, but current values for all metrics become 
> correct
> 5. Everythig become  OK, except that when I check history for metric ( 
> ...xymon-cgi/history.sh? ...)  for some metrics.
> XyMon always reported Purple for last event (since that incident time).
> 
> 
> It was just for some metrics (not all) and I had second monitoring server 
> with the same information (not having time shift incident) and I was able 
> to live with it some month.
> 
> Solution: 
> Today I have fixed that reporting problem with the following steps, which 
> should be executed for every host-metric pair having the problem
> 
> We should operate with 2 files:
> 1) host history file  like 
> hist/HOSTNAME 
> # here we should find records with negative duration values like:
> svcs 1435410898 1435426055 -15157 gr pu 1
> who 1435410899 1435426055 -15156 gr pu 1
> msgs 1435410899 1435426055 -15156 gr pu 1
> netstat 1435410899 1435426055 -15156 gr pu 1
> memory 1435411034 1435426055 -15021 ye pu 2
> uptime 1435411140 1435426055 -14915 gr pu 1
> procs 1435411145 1435426055 -14910 gr pu 1
> disk 1435411150 1435426055 -14905 ye pu 2
> cpu 1435411222 1435426055 -14833 gr pu 1
> 
> # and drop them
> 
> 2) service history file like
> hist/HOSTNAME.svc
> # again -  find records with negative duration values like:
> Sat Jun 27 20:27:35 2015 purple 1435426055 -15157
> 
> # and  drop record(s)  - really should be just one 
> 
> 
> Really to fix just one service reporting - it is enough to drop negative 
> duration records from service history file only (tested).
> But I do not see any reason to have such records in host history file, so 
> I delete from that file too.
> 
> How to automate the process:
> # find hist files for 
> # step 1: 
> find hist/ -print0 -name "*.*" | xargs -0 grep " -" | awk '{print $1" 
> :"$4}' | grep ":-"
> 
> #output like:
> ...
> hist/idc-oracle03.msc-sh.local:ssh :-14862
> hist/idc-oracle03.msc-sh.local:dblock :-15012
> hist/idc-oracle03.msc-sh.local:dbrec :-15012
> hist/idc-oracle03.msc-sh.local:dbup :-15011
> hist/idc-oracle03.msc-sh.local:dbext :-14989
> ...
> 
> # step 2:    find hist/ -print0 -name "*.*" | xargs -0 grep " -" | awk 
> '{print $1" :"$8}' | grep ":-"
> # output like:
> ..
> hist/idc-oracle03,domain.com.dbrec:Sat :-15012
> hist/gdc-oracle03,domain.com.dbup:Sat :-15136
> hist/idc-oracle01,domain.com.disk:Sat :-14961
> hist/gdc-oracle01,domain.com.dbaud:Thu :-26793
> hist/gdc-oracle01,domain.com.dbaud:Sat :-14940
> ..
> 
> Then can automate the records removal too.
> 
> 
> Best regards,
> 
> Andrey Chervonets
> ----------------------
> SIA CoMinder
> http://www.cominder.eu/
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.xymon.com/pipermail/xymon/attachments/20160310/afa861a0/attachment-0001.html>
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Xymon mailing list
> Xymon at xymon.com
> http://lists.xymon.com/mailman/listinfo/xymon
> 
> 
> ------------------------------
> 
> End of Xymon Digest, Vol 62, Issue 8
> ************************************




More information about the Xymon mailing list