[Xymon] Wierd scrambled/garbled disk data
Scot Kreienkamp
Scot.Kreienkamp at la-z-boy.com
Thu Dec 4 16:18:17 CET 2014
Juergen,
I've seen this also. I have the same issue occasionally but once it starts mine won't go away until the server is restarted or at least the BBWin service is restarted. The only thing I can suggest unfortunately is to put a delayred of at least 2-3 polling cycles on your client for the disk test.
If I remember correctly BBWin has some memory issues, not sure if that's what we're running into. BBWin is abandonware so you get what you get with it, and the Xymon PS client is too heavy on processing to be usable in a production environment. I was successful in making a lightweight centralized client poller via SNMP some time ago, but it was so processor heavy it's not usable beyond 5-10 clients.
Bottom line: right now there's no good way to monitor windows clients with Xymon that I'm aware of.
Scot Kreienkamp
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Juergen Fischer
Sent: Thursday, December 04, 2014 5:24 AM
To: xymon at xymon.com
Subject: [Xymon] Wierd scrambled/garbled disk data
Hello, I'm already fighting for a few weeks against this problem:
Occasionally we receive false disk alarms like the following:
yellow Thu Nov 27 08:44:51 2014 - Filesystems NOT ok
&red C (29143392% used) has reached the PANIC level (95%)
Filesystem 1K-blocks Used Avail Capacity Total Size Free Space Type Mount Poin
C ] 52428796 23285404 29143392 44% 50.00 GB 27.79 GB FIXED N/A
D
90906620 47401100 43505520 52% ]
86.70 GB 41.49 GB ]
FIXED N/A
E 28668]
84 14513420 272167664 5% ]
273.40 GB 259.56 ]
FIXED N/A
Status unchanged in 0.00 minutes
Message received from 10.158.10.1
Client data ID 1417074295
Note that the message contains ] and newline characters inserted at arbitrary positions. Therefore drive letters and drive sizes may appear at wrong positions as compared to a normal disk message. One may think that, of course, if (in the above example) Xymon receives in the C-drive line ' 29143392' - due to the inserted ] - shifted 1 position to the right where normally the Usage Percentage appears, it will interpret the' 29143392' as the Usage Percentage and hence of course issue an alarm as in the above: &red C (29143392% used)....
(PS: I just spotted that the 1st line says "yellow ...", but the 2nd says "&red ..." - no idea why - but let's ignore this for now!)
So my first guess was the BBWin clients may send these corrupted disk messages and to proof this I was trying to catch such data on the client side. But so far I did not manage to catch an example yet. However, when it occurs I can easily list the latest examples on the Xymon server by grepping on huge percentages in the histlogs directory:
jfische2[Xymon(mewappfrk071v)]:/var/lib/xymon/histlogs>ls -lt `grep -l ' ([0-9][0-9][0-9][0-9]*% used' */disk/* `
-rw-rw-r-- 1 hobbit hobbit 548 Dec 4 08:59 MEWAPPFRK092V/disk/Thu_Dec_4_08:59:11_2014
-rw-rw-r-- 1 hobbit hobbit 1691 Dec 2 12:53 MEWDBSFRK462V/disk/Tue_Dec_2_12:53:41_2014
-rw-rw-r-- 1 hobbit hobbit 885 Dec 2 10:39 MEWDBSFRK350V/disk/Tue_Dec_2_10:39:07_2014
-rw-rw-r-- 1 hobbit hobbit 598 Dec 1 13:48 MEWDC8BOT001/disk/Mon_Dec_1_13:48:33_2014
-rw-rw-r-- 1 hobbit hobbit 722 Dec 1 12:00 MEWDC8SAL001/disk/Mon_Dec_1_12:00:05_2014
-rw-rw-r-- 1 hobbit hobbit 604 Nov 28 14:50 MEWDC8AMO001/disk/Fri_Nov_28_14:50:03_2014
-rw-rw-r-- 1 hobbit hobbit 665 Nov 28 13:58 MEWDC8AMO001/disk/Fri_Nov_28_13:58:15_2014
-rw-rw-r-- 1 hobbit hobbit 713 Nov 28 10:36 MEWDC8MAD001/disk/Fri_Nov_28_10:36:09_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 28 09:59 MEWDC8TOU001/disk/Fri_Nov_28_09:59:49_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 28 09:49 MEWDC8STE001/disk/Fri_Nov_28_09:49:28_2014
-rw-rw-r-- 1 hobbit hobbit 656 Nov 28 08:16 MEWDC8STE001/disk/Fri_Nov_28_08:16:11_2014
-rw-rw-r-- 1 hobbit hobbit 778 Nov 27 14:32 MEWDC8FEL001/disk/Thu_Nov_27_14:32:32_2014
-rw-rw-r-- 1 hobbit hobbit 593 Nov 27 10:33 MEWDC8SAL001/disk/Thu_Nov_27_10:33:49_2014
-rw-rw-r-- 1 hobbit hobbit 591 Nov 27 09:41 MEWDC8MAD001/disk/Thu_Nov_27_09:41:57_2014
-rw-rw-r-- 1 hobbit hobbit 603 Nov 27 09:36 MEWDC8AVE001/disk/Thu_Nov_27_09:36:46_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 27 08:50 MEWDC8STE001/disk/Thu_Nov_27_08:50:07_2014
-rw-rw-r-- 1 hobbit hobbit 597 Nov 27 08:44 MEWDC8LUB001/disk/Thu_Nov_27_08:44:55_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 27 08:39 MEWDC8STE001/disk/Thu_Nov_27_08:39:45_2014
-rw-rw-r-- 1 hobbit hobbit 554 Nov 27 08:29 MEWAPPFRK974V/disk/Thu_Nov_27_08:29:25_2014
-rw-rw-r-- 1 hobbit hobbit 787 Nov 26 16:45 MEWDC8AMO001/disk/Wed_Nov_26_16:45:05_2014
-rw-rw-r-- 1 hobbit hobbit 885 Nov 26 16:19 MEWDBSFRK341V/disk/Wed_Nov_26_16:19:15_2014
-rw-rw-r-- 1 hobbit hobbit 1692 Nov 26 15:37 MEWDBSFRK405/disk/Wed_Nov_26_15:37:46_2014
-rw-rw-r-- 1 hobbit hobbit 674 Nov 25 17:07 MEWDBSFRK419V/disk/Tue_Nov_25_17:07:49_2014
-rw-rw-r-- 1 hobbit hobbit 729 Nov 25 13:50 MEWAPPFRK760V/disk/Tue_Nov_25_13:50:33_2014
-rw-rw-r-- 1 hobbit hobbit 584 Nov 25 12:48 MEWAPPFRK910V/disk/Tue_Nov_25_12:48:18_2014
-rw-rw-r-- 1 hobbit hobbit 605 Nov 25 12:22 MEWDC8WIE010/disk/Tue_Nov_25_12:22:28_2014
-rw-rw-r-- 1 hobbit hobbit 597 Nov 25 08:08 MEWDC8LUB001/disk/Tue_Nov_25_08:08:14_2014
-rw-rw-r-- 1 hobbit hobbit 662 Nov 24 16:39 MEWDC8SPY001/disk/Mon_Nov_24_16:39:24_2014
-rw-rw-r-- 1 hobbit hobbit 726 Nov 24 14:56 MEWDC8SCW001/disk/Mon_Nov_24_14:56:12_2014
-rw-rw-r-- 1 hobbit hobbit 1967 Nov 24 14:45 MEWDBSFRK461V/disk/Mon_Nov_24_14:45:53_2014
-rw-rw-r-- 1 hobbit hobbit 842 Nov 24 14:04 MEWDBSFRK439V/disk/Mon_Nov_24_14:04:27_2014
-rw-rw-r-- 1 hobbit hobbit 697 Nov 24 13:28 MEWAPPFRK763V/disk/Mon_Nov_24_13:28:18_2014
-rw-rw-r-- 1 hobbit hobbit 596 Nov 24 09:56 MEWDC8TOU001/disk/Mon_Nov_24_09:56:02_2014
-rw-rw-r-- 1 hobbit hobbit 911 Nov 21 15:17 MEWDBSFRK355V/disk/Fri_Nov_21_15:17:35_2014
-rw-rw-r-- 1 hobbit hobbit 490 Nov 21 12:37 MEWBCSFRK122V/disk/Fri_Nov_21_12:37:22_2014
-rw-rw-r-- 1 hobbit hobbit 894 Nov 21 07:51 MEWDBSFRK419V/disk/Fri_Nov_21_07:51:19_2014
These are all Windows servers (most are Windows 2008, few are Windows 2003).
Now the latest example of today stroke me:
red ]hu Dec 04 08:59:08 2014 - Filesystems NOT ok
&red 2427740 (2427740% used) has reached the PANIC level (95%)
&red 425724 (425724% used) has reached the PANIC level (95%)
Filesystem 1K-blocks Used
Avail Capacity Mounted Summary(]
tal\Avail)
C ]
2427740 32463796 19963944 ]
61% /FIXED/C 49.10]
\19.40gb
D ]
425724 6310604 46115120 12% ]
/FIXED/D 49.10gb\43.1]
gb
Status unchanged in 0.00 minutes
Message received from 10.30.99.92
Client data ID 1417679951
Here in the 1st line you see "red ]hu ...". I reckon that this line was generated at the Xymon server and not at the client, because it contains the server side (central mode) interpretation of the client data. And even this very Xymon generated line contains a ] now! That means the clients seem to be ok and the problem is at the Xymon server. This is also in line with my other observation on the above example: it says e.g. "&red 2427740 (2427740% used)". So it has interpreted '2427740' as the drive letter, which can be well explained by finding '2427740' on position 1 in one of the message lines. But why that same number again for the Percentage used ('2427740% used')? Really to me it seems the Xymon server program is confused.
Our environment:
Xymon 4.3.10
Red Hat Enterprise Linux ES release 4 (Nahant)
OS=32bit
Hardware=i686
I did already search the mailing list for 'garbled' problems and found some, but they all did not seem to match my problem here, because they dealt with truncated messages. My messags are not truncated - they are garbled as in the above examples. Neverthess the recommondations there were to increase the MAX parameters, but I think ours are already high and I believe these are not causing our problems. Our MAX's are:
xymonserver.cfg:MAXMSGSPERCOMBO="100"
xymonserver.cfg:WMLMAXCHARS="1500" # Max number of bytes in a WAP message
xymonserver.cfg:BBMAXMSGSPERCOMBO="$MAXMSGSPERCOMBO"
xymonserver.cfg:MAXLINE="32768"
xymonserver.cfg:MAXMSG_DATA="10485760"
xymonserver.cfg:MAXMSG_STACHG="10485760"
xymonserver.cfg:MAXMSG_STATUS="10485760"
xymonserver.cfg:MAXMSG_NOTES="10485760"
xymonserver.cfg:MAXMSG_CLIENT="10485760"
xymonserver.cfg:MAXMSG_ENADIS="10485760"
xymonserver.cfg:MAXMSG_CLICHG="10485760"
Anyone any clue?
Thanks so much
Jürgen
This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141204/968a2551/attachment.html>
More information about the Xymon
mailing list