[Xymon] logfetch issue - sending old data causing false alerts
Max Xu
Max.Xu at servicenow.com
Wed Jun 21 20:49:57 CEST 2017
Hi,
Logfetch is sending old data causing false alerts.
The log file looks somewhat like this:
Error 2017-06-14 11:36:58.613343 39915 2184308576 Compare server: ……
Error 2017-06-14 11:36:58.613481 39913 1581872992 Command server: ……
…… (note. The above repeat about 780K times)
Info 2017-06-14 13:07:41.113163 1193 1036199776 Compare server exited normally, pid = 45494 [sp_desvr]
…..
Error 2017-06-15 02:42:22.820068 1761 2399766368 Command server:…..
……
At 6/19 and 6/20, msgs alert generated with all the old data of 6/14 and 6/15 etc. below is sniper of alert on 6/19
Mon Jun 19 17:48:57 PDT 2017 - Log files NOT ok
[red] Critical entries in /u01/shareplex/var/log/event_log<https://monitor01.lhr9.service-now.com/xymon-cgi/svcstatus.sh?CLIENT=ora164106.sjc4.service-now.com&SECTION=msgs:/u01/shareplex/var/log/event_log>
[red] Error 2017-06-14 12:07:24.545252 9795 1581102944 Command server: ReconcileLog: failed to construct object-cache: Illegal state: Item 372354 already in the object id registry (connecting from ora164106.sjc4.service-now.com) [module osp]
[red] Error 2017-06-14 12:07:24.545499 9795 1581102944 Command server: ReconcileLog: failed to construct object-cache: Illegal state: Item 372356 already in the object id registry (connecting from ora164106.sjc4.service-now.com) [module osp]
Meantime, see xymonclient.log:
2017-06-19 17:49:01.428381 logfetch: File /u01/shareplex/var/log/event_log shrank from >=173538314 to 48414720 bytes in size. Probably rotated; clearing position state
2017-06-19 17:49:01.428462 logfetch: /u01/shareplex/var/log/event_log delta 48414720 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 17:51:05.086815 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 17:53:09.134469 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 17:55:12.647682 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 17:57:16.163913 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 17:59:19.662801 logfetch: /u01/shareplex/var/log/event_log delta 173538314 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 18:01:23.180499 logfetch: /u01/shareplex/var/log/event_log delta 173538453 bytes exceeds max buffer size 10485760; skipping some data
2017-06-19 18:03:26.777636 logfetch: /u01/shareplex/var/log/event_log delta 125123733 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:42:01.519481 logfetch: File /u01/shareplex/var/log/event_log shrank from >=173541482 to 74420224 bytes in size. Probably rotated; clearing position state
2017-06-20 06:42:01.519557 logfetch: /u01/shareplex/var/log/event_log delta 74420224 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:44:05.173606 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:46:08.670466 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:48:12.188216 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:50:15.683455 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:52:19.250727 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:54:22.752463 logfetch: /u01/shareplex/var/log/event_log delta 173541633 bytes exceeds max buffer size 10485760; skipping some data
2017-06-20 06:56:23.426678 logfetch: /u01/shareplex/var/log/event_log delta 99121409 bytes exceeds max buffer size 10485760; skipping some data
Noted.
1. The above 2m interval is my setup of xymon client.
2. It seems the logfetch status file is not successfully saved and source code shows no error check (so no direct evidence).
3. The behavior only last under 20 min. The server itself did not have disk and cpu alerts and no one report any issues related to disk and io.
I was told that this behavior is not new although rarely happen. Is there any solution or work round?
My running version is:
Xymon version 4.3.25-1.el6.terabithia
Thanks,
-max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20170621/7a86af7d/attachment.html>
More information about the Xymon
mailing list