[Xymon] False alert on disk
Lars Kollstedt
lk at man-da.de
Mon Nov 28 23:14:14 CET 2016
Hi Neil, hi List,
on Friday, 25. November 2016, 13:36:29 Neil Simmonds wrote:
> Hi all,
>
> I'm getting a strange false alert on one of our Xymon systems.
> We got an alert for disk and the webpage output looks like this,
>
>
> Fri Nov 25 13:18:56 2016 - Filesystems NOT ok
>
> red 99 (0 units free) has reached the PANIC level (524288 units)
>
> red GB (18446744073709551615 units free) has reached the PANIC level (524288
> units)
>
> red N/A (18446744073709551615 units free) has reached the PANIC level
> (524288 units)
>
>
>
> Filesystem 1K-blocks Used Avail Capacity Total Size Free
> Space Type Mount Point
[...]
> C 52420060 33935280 18484780 64% 49
>
> 99 GB] 17.63 GB FIXED N/A
[...]
>
> Notice that some of the lines seem to have spurious line feeds, there is a
> square bracket that has appeared and we have some letters missing.
>
> When I clicked on the link for the client data this is what the disk section
> looks like.
[...]
> As you can see, there doesn't appear to be anything wrong with this.
Yes. I'm not not completely sure, that would always show up here already. But
captured the client message channel and analyzed it per script. And the
messages I got where all OK.
>
> The only difference that I am aware of with this is that on our system where
> we are not seeing this, we are running Xymon 4.3.4 on CentOS 5.6 and on the
> one where we are seeing the issue we are running Xymon 4.3.4 on CentOS 6.3
>
[...]
> Has anyone ever seen this kind of behaviour?
Yes, I had the same issue some weeks ago on really old 4.3.0.0-beta2. It
turned out this was caused by an initialization issue when truncating client
messages. So it was caused by a large client message, from the client
reporting before.
My workaround for this was to allow larger client messages, but I'm not sure
this wouldn't even possibly have security impact, since the behavior is still
strange for false initialized pointers or data left over in hobbitd_worker.c /
xymond_worker.c, when truncating messages.
Mainly the stuff you give as "99 GB] " made me worry about this. Where is this
braked from? I had it, too. See examples below. And it definitely wasn't in
this place in the client message passed to the hobbitd_client / xymond_client
worker.
After lots of debugging I saw the "Got over-size message, truncating at" that
lead me to the cause.
But I hadn't the time to really hunt it down, till now. :-( Possibly I'm also
not familiar enough with the xymon code for this. ;-)
I often also had a bracket an sometimes a line break but sometimes nothing of
both within the df's output headline.
It was randomly affecting different machines, and the Square Brackets where also
found within the ports status reported by the hobbitd_client / xymond_client
worker, but didn't result in red statuses there due to our mostly less hard
analysis rules for the ports.
**** False Positive Message ****
manda4.hrz.tu-darmstadt.de:disk red [443790]
red Sat Oct 15 04]20:35 CEST 2016 - Filesystems NOT ok
&red 15594972 15% / (2651148% used) has reached the PANIC level (95%)
&red 609648 1% /run (444% used) has reached the PANIC level (95%)
&red 2% /tmp (1787588% used) has reached the PANIC level (95%)
&red 13324360 3% /home (360668% used) has reached the PANIC level (95%)
&red 44667620 6% /srv (2574396% used) has reached the PANIC level (95%)
&red 39834852 13% /var (5784076% used) has reached the PANIC level (95%)
&red 4472720 4% /var/lib/mysql (179952% used) has reached the PANIC
level (95%)
&red 10% /var/lib/hobbit (116445760% used) has reached the PANIC level (95%)
Filesystem 1024-bloc
s ]
Use] Available Capacity Mounted on
/dev/sda1 19222656 2651148 15594972 15% /
udev 3041408 4 3041404 1% /dev
tmpfs 610092 444 609648 1% /run
none 5120 0 5120 0% /run/lock
none 3050460 0 3050460 0% /run/shm
/dev/sda7
19210]6 35864 1787588 2% /tmp
/dev/sda8 14417392 360668 13324360 3% /home
/dev/sda9 49770220 2574396 44667620 6% /srv
/dev/sda6 48060296 5784076 39834852 13% /var
/dev/sda10 4914816 179952 4472720 4% /var/lib/mysql
/dev/sda11 1
531996] 11656580 116445760 10% /var/lib/hobbit
**** False Positive Message ****
maven01-vb.hrz.tu-darmstadt.de:disk red [774507]
red Sat Oct 15 09:46:22 CEST 2016 - Filesystems NOT ok
&red 1% /run (406100% used) has reached the PANIC level (95%)
Filesystem 1024-blocks Used
Available Capacity Mounted on
udev 10240 0
10240 0% /dev
t
pfs ] 406356 256
406100 1% /run
/dev/disk/by-uuid/298ee340-256f-4430-bba1-a14a475728c1 19222656 4254772
13991348 24% /
tmpfs 5120 0
5120
0% /r]n/lock
tmpfs 1398620 0
1398620 0% /run/shm
/dev/sda1 350275 19677
311910 6% /boot
/dev/sda5 8484528 220312
7833216 3% /home
/dev/sdb1 31391836 6749152
23069824 23% /mnt/vol0
Since allowing lager client-messages the issues are gone. The oversize message
came from the machine reporting one or two client messages before. As far a I
could reproduce the client message from the machine in between was completely
ignored if the cause was two before.
Kind regards.
Lars
--
man-da.de GmbH, AS8365 Phone: +49 6151 16-71027
Mornewegstraße 30 Fax: +49 6151 16-71198
D-64293 Darmstadt e-mail: lk at man-da.de
Geschäftsführer Marcus Stögbauer AG Darmstadt, HRB 94 84
More information about the Xymon
mailing list