[Xymon] Wierd scrambled/garbled disk data
Juergen Fischer
jfische2 at csc.com
Thu Dec 4 11:24:11 CET 2014
Hello, I'm already fighting for a few weeks against this problem:
Occasionally we receive false disk alarms like the following:
yellow Thu Nov 27 08:44:51 2014 - Filesystems NOT ok
&red C (29143392% used) has reached the PANIC level (95%)
Filesystem 1K-blocks Used Avail Capacity Total Size Free
Space Type Mount Poin
C ] 52428796 23285404 29143392 44% 50.00 GB 27.79
GB FIXED N/A
D
90906620 47401100 43505520 52% ]
86.70 GB 41.49 GB ]
FIXED N/A
E 28668]
84 14513420 272167664 5% ]
273.40 GB 259.56 ]
FIXED N/A
Status unchanged in 0.00 minutes
Message received from 10.158.10.1
Client data ID 1417074295
Note that the message contains ] and newline characters inserted at
arbitrary positions. Therefore drive letters and drive sizes may appear at
wrong positions as compared to a normal disk message. One may think that,
of course, if (in the above example) Xymon receives in the C-drive line '
29143392' - due to the inserted ] - shifted 1 position to the right where
normally the Usage Percentage appears, it will interpret the' 29143392' as
the Usage Percentage and hence of course issue an alarm as in the above:
&red C (29143392% used)....
(PS: I just spotted that the 1st line says "yellow ...", but the 2nd says
"&red ..." - no idea why - but let's ignore this for now!)
So my first guess was the BBWin clients may send these corrupted disk
messages and to proof this I was trying to catch such data on the client
side. But so far I did not manage to catch an example yet. However, when
it occurs I can easily list the latest examples on the Xymon server by
grepping on huge percentages in the histlogs directory:
jfische2[Xymon(mewappfrk071v)]:/var/lib/xymon/histlogs>ls -lt `grep -l '
([0-9][0-9][0-9][0-9]*% used' */disk/* `
-rw-rw-r-- 1 hobbit hobbit 548 Dec 4 08:59
MEWAPPFRK092V/disk/Thu_Dec_4_08:59:11_2014
-rw-rw-r-- 1 hobbit hobbit 1691 Dec 2 12:53
MEWDBSFRK462V/disk/Tue_Dec_2_12:53:41_2014
-rw-rw-r-- 1 hobbit hobbit 885 Dec 2 10:39
MEWDBSFRK350V/disk/Tue_Dec_2_10:39:07_2014
-rw-rw-r-- 1 hobbit hobbit 598 Dec 1 13:48
MEWDC8BOT001/disk/Mon_Dec_1_13:48:33_2014
-rw-rw-r-- 1 hobbit hobbit 722 Dec 1 12:00
MEWDC8SAL001/disk/Mon_Dec_1_12:00:05_2014
-rw-rw-r-- 1 hobbit hobbit 604 Nov 28 14:50
MEWDC8AMO001/disk/Fri_Nov_28_14:50:03_2014
-rw-rw-r-- 1 hobbit hobbit 665 Nov 28 13:58
MEWDC8AMO001/disk/Fri_Nov_28_13:58:15_2014
-rw-rw-r-- 1 hobbit hobbit 713 Nov 28 10:36
MEWDC8MAD001/disk/Fri_Nov_28_10:36:09_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 28 09:59
MEWDC8TOU001/disk/Fri_Nov_28_09:59:49_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 28 09:49
MEWDC8STE001/disk/Fri_Nov_28_09:49:28_2014
-rw-rw-r-- 1 hobbit hobbit 656 Nov 28 08:16
MEWDC8STE001/disk/Fri_Nov_28_08:16:11_2014
-rw-rw-r-- 1 hobbit hobbit 778 Nov 27 14:32
MEWDC8FEL001/disk/Thu_Nov_27_14:32:32_2014
-rw-rw-r-- 1 hobbit hobbit 593 Nov 27 10:33
MEWDC8SAL001/disk/Thu_Nov_27_10:33:49_2014
-rw-rw-r-- 1 hobbit hobbit 591 Nov 27 09:41
MEWDC8MAD001/disk/Thu_Nov_27_09:41:57_2014
-rw-rw-r-- 1 hobbit hobbit 603 Nov 27 09:36
MEWDC8AVE001/disk/Thu_Nov_27_09:36:46_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 27 08:50
MEWDC8STE001/disk/Thu_Nov_27_08:50:07_2014
-rw-rw-r-- 1 hobbit hobbit 597 Nov 27 08:44
MEWDC8LUB001/disk/Thu_Nov_27_08:44:55_2014
-rw-rw-r-- 1 hobbit hobbit 658 Nov 27 08:39
MEWDC8STE001/disk/Thu_Nov_27_08:39:45_2014
-rw-rw-r-- 1 hobbit hobbit 554 Nov 27 08:29
MEWAPPFRK974V/disk/Thu_Nov_27_08:29:25_2014
-rw-rw-r-- 1 hobbit hobbit 787 Nov 26 16:45
MEWDC8AMO001/disk/Wed_Nov_26_16:45:05_2014
-rw-rw-r-- 1 hobbit hobbit 885 Nov 26 16:19
MEWDBSFRK341V/disk/Wed_Nov_26_16:19:15_2014
-rw-rw-r-- 1 hobbit hobbit 1692 Nov 26 15:37
MEWDBSFRK405/disk/Wed_Nov_26_15:37:46_2014
-rw-rw-r-- 1 hobbit hobbit 674 Nov 25 17:07
MEWDBSFRK419V/disk/Tue_Nov_25_17:07:49_2014
-rw-rw-r-- 1 hobbit hobbit 729 Nov 25 13:50
MEWAPPFRK760V/disk/Tue_Nov_25_13:50:33_2014
-rw-rw-r-- 1 hobbit hobbit 584 Nov 25 12:48
MEWAPPFRK910V/disk/Tue_Nov_25_12:48:18_2014
-rw-rw-r-- 1 hobbit hobbit 605 Nov 25 12:22
MEWDC8WIE010/disk/Tue_Nov_25_12:22:28_2014
-rw-rw-r-- 1 hobbit hobbit 597 Nov 25 08:08
MEWDC8LUB001/disk/Tue_Nov_25_08:08:14_2014
-rw-rw-r-- 1 hobbit hobbit 662 Nov 24 16:39
MEWDC8SPY001/disk/Mon_Nov_24_16:39:24_2014
-rw-rw-r-- 1 hobbit hobbit 726 Nov 24 14:56
MEWDC8SCW001/disk/Mon_Nov_24_14:56:12_2014
-rw-rw-r-- 1 hobbit hobbit 1967 Nov 24 14:45
MEWDBSFRK461V/disk/Mon_Nov_24_14:45:53_2014
-rw-rw-r-- 1 hobbit hobbit 842 Nov 24 14:04
MEWDBSFRK439V/disk/Mon_Nov_24_14:04:27_2014
-rw-rw-r-- 1 hobbit hobbit 697 Nov 24 13:28
MEWAPPFRK763V/disk/Mon_Nov_24_13:28:18_2014
-rw-rw-r-- 1 hobbit hobbit 596 Nov 24 09:56
MEWDC8TOU001/disk/Mon_Nov_24_09:56:02_2014
-rw-rw-r-- 1 hobbit hobbit 911 Nov 21 15:17
MEWDBSFRK355V/disk/Fri_Nov_21_15:17:35_2014
-rw-rw-r-- 1 hobbit hobbit 490 Nov 21 12:37
MEWBCSFRK122V/disk/Fri_Nov_21_12:37:22_2014
-rw-rw-r-- 1 hobbit hobbit 894 Nov 21 07:51
MEWDBSFRK419V/disk/Fri_Nov_21_07:51:19_2014
These are all Windows servers (most are Windows 2008, few are Windows
2003).
Now the latest example of today stroke me:
red ]hu Dec 04 08:59:08 2014 - Filesystems NOT ok
&red 2427740 (2427740% used) has reached the PANIC level (95%)
&red 425724 (425724% used) has reached the PANIC level (95%)
Filesystem 1K-blocks Used
Avail Capacity Mounted Summary(]
tal\Avail)
C ]
2427740 32463796 19963944 ]
61% /FIXED/C 49.10]
\19.40gb
D ]
425724 6310604 46115120 12% ]
/FIXED/D 49.10gb\43.1]
gb
Status unchanged in 0.00 minutes
Message received from 10.30.99.92
Client data ID 1417679951
Here in the 1st line you see "red ]hu ...". I reckon that this line was
generated at the Xymon server and not at the client, because it contains
the server side (central mode) interpretation of the client data. And even
this very Xymon generated line contains a ] now! That means the clients
seem to be ok and the problem is at the Xymon server. This is also in line
with my other observation on the above example: it says e.g. "&red 2427740
(2427740% used)". So it has interpreted '2427740' as the drive letter,
which can be well explained by finding '2427740' on position 1 in one of
the message lines. But why that same number again for the Percentage used
('2427740% used')? Really to me it seems the Xymon server program is
confused.
Our environment:
Xymon 4.3.10
Red Hat Enterprise Linux ES release 4 (Nahant)
OS=32bit
Hardware=i686
I did already search the mailing list for 'garbled' problems and found
some, but they all did not seem to match my problem here, because they
dealt with truncated messages. My messags are not truncated - they are
garbled as in the above examples. Neverthess the recommondations there
were to increase the MAX parameters, but I think ours are already high and
I believe these are not causing our problems. Our MAX's are:
xymonserver.cfg:MAXMSGSPERCOMBO="100"
xymonserver.cfg:WMLMAXCHARS="1500" # Max
number of bytes in a WAP message
xymonserver.cfg:BBMAXMSGSPERCOMBO="$MAXMSGSPERCOMBO"
xymonserver.cfg:MAXLINE="32768"
xymonserver.cfg:MAXMSG_DATA="10485760"
xymonserver.cfg:MAXMSG_STACHG="10485760"
xymonserver.cfg:MAXMSG_STATUS="10485760"
xymonserver.cfg:MAXMSG_NOTES="10485760"
xymonserver.cfg:MAXMSG_CLIENT="10485760"
xymonserver.cfg:MAXMSG_ENADIS="10485760"
xymonserver.cfg:MAXMSG_CLICHG="10485760"
Anyone any clue?
Thanks so much
Jürgen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20141204/5f431bba/attachment.html>
More information about the Xymon
mailing list