[Xymon] Wierd scrambled/garbled disk data

Juergen Fischer jfische2 at csc.com
Thu Dec 4 11:24:11 CET 2014


Hello, I'm already fighting for a few weeks against this problem:

Occasionally we receive false disk alarms like the following:

yellow Thu Nov 27 08:44:51 2014 - Filesystems NOT ok
&red C (29143392% used) has reached the PANIC level (95%)

Filesystem   1K-blocks     Used       Avail    Capacity   Total Size Free 
Space   Type    Mount Poin

C     ]       52428796   23285404   29143392    44%         50.00 GB 27.79 
GB   FIXED   N/A
D
    90906620   47401100   43505520    52%   ]
    86.70 GB     41.49 GB ]
FIXED   N/A
E            28668]
84   14513420  272167664     5% ]
     273.40 GB    259.56 ]
   FIXED   N/A
Status unchanged in 0.00 minutes
Message received from 10.158.10.1
Client data ID 1417074295

Note that the message contains ] and newline characters inserted at 
arbitrary positions. Therefore drive letters and drive sizes may appear at 
wrong positions as compared to a normal disk message. One may think that, 
of course, if (in the above example) Xymon receives in the C-drive line ' 
29143392' - due to the inserted ] - shifted 1 position to the right where 
normally the Usage Percentage appears, it will interpret the' 29143392' as 
the Usage Percentage and hence of course issue an alarm as in the above: 
&red C (29143392% used)....
(PS: I just spotted that the 1st line says "yellow ...", but the 2nd says 
"&red ..."  - no idea why - but let's ignore this for now!)
 
So my first guess was the BBWin clients may send these corrupted disk 
messages and to proof this I was trying to catch such data on the client 
side. But so far I did not manage to catch an example yet. However, when 
it occurs I can easily list the latest examples on the Xymon server by 
grepping on huge percentages in the histlogs directory:

jfische2[Xymon(mewappfrk071v)]:/var/lib/xymon/histlogs>ls -lt `grep -l ' 
([0-9][0-9][0-9][0-9]*% used' */disk/* `
-rw-rw-r--  1 hobbit hobbit  548 Dec  4 08:59 
MEWAPPFRK092V/disk/Thu_Dec_4_08:59:11_2014
-rw-rw-r--  1 hobbit hobbit 1691 Dec  2 12:53 
MEWDBSFRK462V/disk/Tue_Dec_2_12:53:41_2014
-rw-rw-r--  1 hobbit hobbit  885 Dec  2 10:39 
MEWDBSFRK350V/disk/Tue_Dec_2_10:39:07_2014
-rw-rw-r--  1 hobbit hobbit  598 Dec  1 13:48 
MEWDC8BOT001/disk/Mon_Dec_1_13:48:33_2014
-rw-rw-r--  1 hobbit hobbit  722 Dec  1 12:00 
MEWDC8SAL001/disk/Mon_Dec_1_12:00:05_2014
-rw-rw-r--  1 hobbit hobbit  604 Nov 28 14:50 
MEWDC8AMO001/disk/Fri_Nov_28_14:50:03_2014
-rw-rw-r--  1 hobbit hobbit  665 Nov 28 13:58 
MEWDC8AMO001/disk/Fri_Nov_28_13:58:15_2014
-rw-rw-r--  1 hobbit hobbit  713 Nov 28 10:36 
MEWDC8MAD001/disk/Fri_Nov_28_10:36:09_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 28 09:59 
MEWDC8TOU001/disk/Fri_Nov_28_09:59:49_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 28 09:49 
MEWDC8STE001/disk/Fri_Nov_28_09:49:28_2014
-rw-rw-r--  1 hobbit hobbit  656 Nov 28 08:16 
MEWDC8STE001/disk/Fri_Nov_28_08:16:11_2014
-rw-rw-r--  1 hobbit hobbit  778 Nov 27 14:32 
MEWDC8FEL001/disk/Thu_Nov_27_14:32:32_2014
-rw-rw-r--  1 hobbit hobbit  593 Nov 27 10:33 
MEWDC8SAL001/disk/Thu_Nov_27_10:33:49_2014
-rw-rw-r--  1 hobbit hobbit  591 Nov 27 09:41 
MEWDC8MAD001/disk/Thu_Nov_27_09:41:57_2014
-rw-rw-r--  1 hobbit hobbit  603 Nov 27 09:36 
MEWDC8AVE001/disk/Thu_Nov_27_09:36:46_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 27 08:50 
MEWDC8STE001/disk/Thu_Nov_27_08:50:07_2014
-rw-rw-r--  1 hobbit hobbit  597 Nov 27 08:44 
MEWDC8LUB001/disk/Thu_Nov_27_08:44:55_2014
-rw-rw-r--  1 hobbit hobbit  658 Nov 27 08:39 
MEWDC8STE001/disk/Thu_Nov_27_08:39:45_2014
-rw-rw-r--  1 hobbit hobbit  554 Nov 27 08:29 
MEWAPPFRK974V/disk/Thu_Nov_27_08:29:25_2014
-rw-rw-r--  1 hobbit hobbit  787 Nov 26 16:45 
MEWDC8AMO001/disk/Wed_Nov_26_16:45:05_2014
-rw-rw-r--  1 hobbit hobbit  885 Nov 26 16:19 
MEWDBSFRK341V/disk/Wed_Nov_26_16:19:15_2014
-rw-rw-r--  1 hobbit hobbit 1692 Nov 26 15:37 
MEWDBSFRK405/disk/Wed_Nov_26_15:37:46_2014
-rw-rw-r--  1 hobbit hobbit  674 Nov 25 17:07 
MEWDBSFRK419V/disk/Tue_Nov_25_17:07:49_2014
-rw-rw-r--  1 hobbit hobbit  729 Nov 25 13:50 
MEWAPPFRK760V/disk/Tue_Nov_25_13:50:33_2014
-rw-rw-r--  1 hobbit hobbit  584 Nov 25 12:48 
MEWAPPFRK910V/disk/Tue_Nov_25_12:48:18_2014
-rw-rw-r--  1 hobbit hobbit  605 Nov 25 12:22 
MEWDC8WIE010/disk/Tue_Nov_25_12:22:28_2014
-rw-rw-r--  1 hobbit hobbit  597 Nov 25 08:08 
MEWDC8LUB001/disk/Tue_Nov_25_08:08:14_2014
-rw-rw-r--  1 hobbit hobbit  662 Nov 24 16:39 
MEWDC8SPY001/disk/Mon_Nov_24_16:39:24_2014
-rw-rw-r--  1 hobbit hobbit  726 Nov 24 14:56 
MEWDC8SCW001/disk/Mon_Nov_24_14:56:12_2014
-rw-rw-r--  1 hobbit hobbit 1967 Nov 24 14:45 
MEWDBSFRK461V/disk/Mon_Nov_24_14:45:53_2014
-rw-rw-r--  1 hobbit hobbit  842 Nov 24 14:04 
MEWDBSFRK439V/disk/Mon_Nov_24_14:04:27_2014
-rw-rw-r--  1 hobbit hobbit  697 Nov 24 13:28 
MEWAPPFRK763V/disk/Mon_Nov_24_13:28:18_2014
-rw-rw-r--  1 hobbit hobbit  596 Nov 24 09:56 
MEWDC8TOU001/disk/Mon_Nov_24_09:56:02_2014
-rw-rw-r--  1 hobbit hobbit  911 Nov 21 15:17 
MEWDBSFRK355V/disk/Fri_Nov_21_15:17:35_2014
-rw-rw-r--  1 hobbit hobbit  490 Nov 21 12:37 
MEWBCSFRK122V/disk/Fri_Nov_21_12:37:22_2014
-rw-rw-r--  1 hobbit hobbit  894 Nov 21 07:51 
MEWDBSFRK419V/disk/Fri_Nov_21_07:51:19_2014

These are all Windows servers (most are Windows 2008, few are Windows 
2003).

Now the latest example of today stroke me:

red ]hu Dec 04 08:59:08 2014 - Filesystems NOT ok
&red 2427740 (2427740% used) has reached the PANIC level (95%)
&red 425724 (425724% used) has reached the PANIC level (95%)

Filesystem     1K-blocks     Used
  Avail    Capacity    Mounted      Summary(]
tal\Avail)
C              ]
2427740   32463796   19963944 ]
 61%       /FIXED/C        49.10]
\19.40gb
D               ]
425724    6310604   46115120    12%    ]
 /FIXED/D        49.10gb\43.1]
gb
Status unchanged in 0.00 minutes
Message received from 10.30.99.92
Client data ID 1417679951

Here in the 1st line you see "red ]hu ...". I reckon that this line was 
generated at the Xymon server and not at the client, because it contains 
the server side (central mode) interpretation of the client data. And even 
this very Xymon generated line contains a ] now! That means the clients 
seem to be ok and the problem is at the Xymon server. This is also in line 
with my other observation on the above example: it says e.g. "&red 2427740 
(2427740% used)". So it has interpreted '2427740' as the drive letter, 
which can be well  explained by finding '2427740' on position 1 in one of 
the message lines. But why that same number again for the Percentage used 
('2427740% used')? Really to me it seems the Xymon server program is 
confused.

Our environment:

Xymon 4.3.10
Red Hat Enterprise Linux ES release 4 (Nahant)
OS=32bit
Hardware=i686

I did already search the mailing list for 'garbled' problems and found 
some, but they all did not seem to match my problem here, because they 
dealt with truncated messages. My messags are not truncated - they are 
garbled as in the above examples. Neverthess the recommondations there 
were to increase the MAX parameters, but I think ours are already high and 
I believe these are not causing our problems. Our MAX's are:

xymonserver.cfg:MAXMSGSPERCOMBO="100"
xymonserver.cfg:WMLMAXCHARS="1500"                              # Max 
number of bytes in a WAP message
xymonserver.cfg:BBMAXMSGSPERCOMBO="$MAXMSGSPERCOMBO"
xymonserver.cfg:MAXLINE="32768"
xymonserver.cfg:MAXMSG_DATA="10485760"
xymonserver.cfg:MAXMSG_STACHG="10485760"
xymonserver.cfg:MAXMSG_STATUS="10485760"
xymonserver.cfg:MAXMSG_NOTES="10485760"
xymonserver.cfg:MAXMSG_CLIENT="10485760"
xymonserver.cfg:MAXMSG_ENADIS="10485760"
xymonserver.cfg:MAXMSG_CLICHG="10485760"

Anyone any clue?

Thanks so much
Jürgen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141204/5f431bba/attachment.html>


More information about the Xymon mailing list