[Xymon] Wierd scrambled/garbled disk data

Scot Kreienkamp Scot.Kreienkamp at la-z-boy.com
Thu Dec 4 16:18:17 CET 2014


Juergen,

I've seen this also.  I have the same issue occasionally but once it starts mine won't go away until the server is restarted or at least the BBWin service is restarted.  The only thing I can suggest unfortunately is to put a delayred of at least 2-3 polling cycles on your client for the disk test.

If I remember correctly BBWin has some memory issues, not sure if that's what we're running into.  BBWin is abandonware so you get what you get with it, and the Xymon PS client is too heavy on processing to be usable in a production environment.  I was successful in making a lightweight centralized client poller via SNMP some time ago, but it was so processor heavy it's not usable beyond 5-10 clients.

Bottom line: right now there's no good way to monitor windows clients with Xymon that I'm aware of.

Scot Kreienkamp

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Juergen Fischer
Sent: Thursday, December 04, 2014 5:24 AM
To: xymon at xymon.com
Subject: [Xymon] Wierd scrambled/garbled disk data

Hello, I'm already fighting for a few weeks against this problem:

Occasionally we receive false disk alarms like the following:

yellow Thu Nov 27 08:44:51 2014 - Filesystems NOT ok
&red C (29143392% used) has reached the PANIC level (95%)

Filesystem   1K-blocks     Used       Avail    Capacity   Total Size   Free Space   Type    Mount Poin

C     ]       52428796   23285404   29143392    44%         50.00 GB     27.79 GB   FIXED   N/A
D
    90906620   47401100   43505520    52%   ]
    86.70 GB     41.49 GB ]
FIXED   N/A
E            28668]
84   14513420  272167664     5% ]
     273.40 GB    259.56 ]
   FIXED   N/A
Status unchanged in 0.00 minutes
Message received from 10.158.10.1
Client data ID 1417074295

Note that the message contains ] and newline characters inserted at arbitrary positions. Therefore drive letters and drive sizes may appear at wrong positions as compared to a normal disk message. One may think that, of course, if (in the above example) Xymon receives in the C-drive line ' 29143392' - due to the inserted ] - shifted 1 position to the right where normally the Usage Percentage appears, it will interpret the' 29143392' as the Usage Percentage and hence of course issue an alarm as in the above: &red C (29143392% used)....
(PS: I just spotted that the 1st line says "yellow ...", but the 2nd says "&red ..."  - no idea why - but let's ignore this for now!)

So my first guess was the BBWin clients may send these corrupted disk messages and to proof this I was trying to catch such data on the client side. But so far I did not manage to catch an example yet. However, when it occurs I can easily list the latest examples on the Xymon server by grepping on huge percentages in the histlogs directory:

jfische2[Xymon(mewappfrk071v)]:/var/lib/xymon/histlogs>ls -lt `grep -l ' ([0-9][0-9][0-9][0-9]*% used' */disk/* `
-rw-rw-r--  1 hobbit hobbit  548 Dec  4 08:59 MEWAPPFRK092V/disk/Thu_Dec_4_08:59:11_2014
-rw-rw-r--  1 hobbit hobbit 1691 Dec  2 12:53 MEWDBSFRK462V/disk/Tue_Dec_2_12:53:41_2014
-rw-rw-r--  1 hobbit hobbit  885 Dec  2 10:39 MEWDBSFRK350V/disk/Tue_Dec_2_10:39:07_2014
-rw-rw-r--  1 hobbit hobbit  598 Dec  1 13:48 MEWDC8BOT001/disk/Mon_Dec_1_13:48:33_2014
-rw-rw-r--  1 hobbit hobbit  722 Dec  1 12:00 MEWDC8SAL001/disk/Mon_Dec_1_12:00:05_2014
-rw-rw-r--  1 hobbit hobbit  604 Nov 28 14:50 MEWDC8AMO001/disk/Fri_Nov_28_14:50:03_2014
-rw-rw-r--  1 hobbit hobbit  665 Nov 28 13:58 MEWDC8AMO001/disk/Fri_Nov_28_13:58:15_2014
-rw-rw-r--  1 hobbit hobbit  713 Nov 28 10:36 MEWDC8MAD001/disk/Fri_Nov_28_10:36:09_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 28 09:59 MEWDC8TOU001/disk/Fri_Nov_28_09:59:49_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 28 09:49 MEWDC8STE001/disk/Fri_Nov_28_09:49:28_2014
-rw-rw-r--  1 hobbit hobbit  656 Nov 28 08:16 MEWDC8STE001/disk/Fri_Nov_28_08:16:11_2014
-rw-rw-r--  1 hobbit hobbit  778 Nov 27 14:32 MEWDC8FEL001/disk/Thu_Nov_27_14:32:32_2014
-rw-rw-r--  1 hobbit hobbit  593 Nov 27 10:33 MEWDC8SAL001/disk/Thu_Nov_27_10:33:49_2014
-rw-rw-r--  1 hobbit hobbit  591 Nov 27 09:41 MEWDC8MAD001/disk/Thu_Nov_27_09:41:57_2014
-rw-rw-r--  1 hobbit hobbit  603 Nov 27 09:36 MEWDC8AVE001/disk/Thu_Nov_27_09:36:46_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 27 08:50 MEWDC8STE001/disk/Thu_Nov_27_08:50:07_2014
-rw-rw-r--  1 hobbit hobbit  597 Nov 27 08:44 MEWDC8LUB001/disk/Thu_Nov_27_08:44:55_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 27 08:39 MEWDC8STE001/disk/Thu_Nov_27_08:39:45_2014
-rw-rw-r--  1 hobbit hobbit  554 Nov 27 08:29 MEWAPPFRK974V/disk/Thu_Nov_27_08:29:25_2014
-rw-rw-r--  1 hobbit hobbit  787 Nov 26 16:45 MEWDC8AMO001/disk/Wed_Nov_26_16:45:05_2014
-rw-rw-r--  1 hobbit hobbit  885 Nov 26 16:19 MEWDBSFRK341V/disk/Wed_Nov_26_16:19:15_2014
-rw-rw-r--  1 hobbit hobbit 1692 Nov 26 15:37 MEWDBSFRK405/disk/Wed_Nov_26_15:37:46_2014
-rw-rw-r--  1 hobbit hobbit  674 Nov 25 17:07 MEWDBSFRK419V/disk/Tue_Nov_25_17:07:49_2014
-rw-rw-r--  1 hobbit hobbit  729 Nov 25 13:50 MEWAPPFRK760V/disk/Tue_Nov_25_13:50:33_2014
-rw-rw-r--  1 hobbit hobbit  584 Nov 25 12:48 MEWAPPFRK910V/disk/Tue_Nov_25_12:48:18_2014
-rw-rw-r--  1 hobbit hobbit  605 Nov 25 12:22 MEWDC8WIE010/disk/Tue_Nov_25_12:22:28_2014
-rw-rw-r--  1 hobbit hobbit  597 Nov 25 08:08 MEWDC8LUB001/disk/Tue_Nov_25_08:08:14_2014
-rw-rw-r--  1 hobbit hobbit  662 Nov 24 16:39 MEWDC8SPY001/disk/Mon_Nov_24_16:39:24_2014
-rw-rw-r--  1 hobbit hobbit  726 Nov 24 14:56 MEWDC8SCW001/disk/Mon_Nov_24_14:56:12_2014
-rw-rw-r--  1 hobbit hobbit 1967 Nov 24 14:45 MEWDBSFRK461V/disk/Mon_Nov_24_14:45:53_2014
-rw-rw-r--  1 hobbit hobbit  842 Nov 24 14:04 MEWDBSFRK439V/disk/Mon_Nov_24_14:04:27_2014
-rw-rw-r--  1 hobbit hobbit  697 Nov 24 13:28 MEWAPPFRK763V/disk/Mon_Nov_24_13:28:18_2014
-rw-rw-r--  1 hobbit hobbit  596 Nov 24 09:56 MEWDC8TOU001/disk/Mon_Nov_24_09:56:02_2014
-rw-rw-r--  1 hobbit hobbit  911 Nov 21 15:17 MEWDBSFRK355V/disk/Fri_Nov_21_15:17:35_2014
-rw-rw-r--  1 hobbit hobbit  490 Nov 21 12:37 MEWBCSFRK122V/disk/Fri_Nov_21_12:37:22_2014
-rw-rw-r--  1 hobbit hobbit  894 Nov 21 07:51 MEWDBSFRK419V/disk/Fri_Nov_21_07:51:19_2014

These are all Windows servers (most are Windows 2008, few are Windows 2003).

Now the latest example of today stroke me:

red ]hu Dec 04 08:59:08 2014 - Filesystems NOT ok
&red 2427740 (2427740% used) has reached the PANIC level (95%)
&red 425724 (425724% used) has reached the PANIC level (95%)

Filesystem     1K-blocks     Used
  Avail    Capacity    Mounted      Summary(]
tal\Avail)
C              ]
2427740   32463796   19963944 ]
 61%       /FIXED/C        49.10]
\19.40gb
D               ]
425724    6310604   46115120    12%    ]
 /FIXED/D        49.10gb\43.1]
gb
Status unchanged in 0.00 minutes
Message received from 10.30.99.92
Client data ID 1417679951

Here in the 1st line you see "red ]hu ...". I reckon that this line was generated at the Xymon server and not at the client, because it contains the server side (central mode) interpretation of the client data. And even this very Xymon generated line contains a ] now! That means the clients seem to be ok and the problem is at the Xymon server. This is also in line with my other observation on the above example: it says e.g. "&red 2427740 (2427740% used)". So it has interpreted '2427740' as the drive letter, which can be well  explained by finding '2427740' on position 1 in one of the message lines. But why that same number again for the Percentage used ('2427740% used')? Really to me it seems the Xymon server program is confused.

Our environment:

Xymon 4.3.10
Red Hat Enterprise Linux ES release 4 (Nahant)
OS=32bit
Hardware=i686

I did already search the mailing list for 'garbled' problems and found some, but they all did not seem to match my problem here, because they dealt with truncated messages. My messags are not truncated - they are garbled as in the above examples. Neverthess the recommondations there were to increase the MAX parameters, but I think ours are already high and I believe these are not causing our problems. Our MAX's are:

xymonserver.cfg:MAXMSGSPERCOMBO="100"
xymonserver.cfg:WMLMAXCHARS="1500"                              # Max number of bytes in a WAP message
xymonserver.cfg:BBMAXMSGSPERCOMBO="$MAXMSGSPERCOMBO"
xymonserver.cfg:MAXLINE="32768"
xymonserver.cfg:MAXMSG_DATA="10485760"
xymonserver.cfg:MAXMSG_STACHG="10485760"
xymonserver.cfg:MAXMSG_STATUS="10485760"
xymonserver.cfg:MAXMSG_NOTES="10485760"
xymonserver.cfg:MAXMSG_CLIENT="10485760"
xymonserver.cfg:MAXMSG_ENADIS="10485760"
xymonserver.cfg:MAXMSG_CLICHG="10485760"

Anyone any clue?

Thanks so much
Jürgen



This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141204/968a2551/attachment.html>


More information about the Xymon mailing list