[Xymon] False alert on disk

Mon Nov 28 23:14:14 CET 2016

Hi Neil, hi List,

on Friday, 25. November 2016, 13:36:29 Neil Simmonds wrote:
> Hi all,
> 
> I'm getting a strange false alert on one of our Xymon systems.
> We got an alert for disk and the webpage output looks like this, 
> 
> 
> Fri Nov 25 13:18:56 2016 - Filesystems NOT ok
> 
> red 99 (0 units free) has reached the PANIC level (524288 units)
> 
> red GB (18446744073709551615 units free) has reached the PANIC level (524288
> units)
> 
> red N/A (18446744073709551615 units free) has reached the PANIC level
> (524288 units)
> 
> 
> 
> Filesystem   1K-blocks     Used       Avail    Capacity   Total Size   Free
> Space   Type    Mount Point
[...]
> C             52420060   33935280   18484780    64%         49
> 
> 99 GB]    17.63 GB   FIXED   N/A
[...]
> 
> Notice that some of the lines seem to have spurious line feeds, there is a
> square bracket that has appeared and we have some letters missing.
> 
> When I clicked on the link for the client data this is what the disk section
> looks like.
[...]
> As you can see, there doesn't appear to be anything wrong with this.
Yes. I'm not not completely sure, that would always show up here already. But 
captured the client message channel and analyzed it per script. And the 
messages I got where all OK.

> 
> The only difference that I am aware of with this is that on our system where
> we are not seeing this, we are running Xymon 4.3.4 on CentOS 5.6 and on the
> one where we are seeing the issue we are running Xymon 4.3.4 on CentOS 6.3
> 
[...]
> Has anyone ever seen this kind of behaviour?

Yes, I had the same issue some weeks ago on really old 4.3.0.0-beta2. It 
turned out this was caused by an initialization issue when truncating client 
messages. So it was caused by a large client message, from the client 
reporting before. 
My workaround for this was to allow larger client messages, but I'm not sure 
this wouldn't even possibly have security impact, since the behavior is still 
strange for false initialized pointers or data left over in hobbitd_worker.c / 
xymond_worker.c, when truncating messages.
Mainly the stuff you give as "99 GB] " made me worry about this. Where is this 
braked from? I had it, too. See examples below. And it definitely wasn't in 
this place in the client message passed to the  hobbitd_client / xymond_client 
worker.

After lots of debugging I saw the "Got over-size message, truncating at" that 
lead me to the cause.

But I hadn't the time to really hunt it down, till now. :-( Possibly I'm also 
not familiar enough with the xymon code for this. ;-) 

I often also had a bracket an sometimes a line break but sometimes nothing of 
both within the df's output headline.
It was randomly affecting different machines, and the Square Brackets where also 
found within the ports status reported by the hobbitd_client / xymond_client 
worker, but didn't result in red statuses there due to our mostly less hard 
analysis rules for the ports.

**** False Positive Message ****
manda4.hrz.tu-darmstadt.de:disk red [443790]
red Sat Oct 15 04]20:35 CEST 2016 - Filesystems NOT ok
&red 15594972      15% / (2651148% used) has reached the PANIC level (95%)
&red 609648       1% /run (444% used) has reached the PANIC level (95%)
&red 2% /tmp (1787588% used) has reached the PANIC level (95%)
&red 13324360       3% /home (360668% used) has reached the PANIC level (95%)
&red 44667620       6% /srv (2574396% used) has reached the PANIC level (95%)
&red 39834852      13% /var (5784076% used) has reached the PANIC level (95%)
&red 4472720       4% /var/lib/mysql (179952% used) has reached the PANIC 
level (95%)
&red 10% /var/lib/hobbit (116445760% used) has reached the PANIC level (95%)

Filesystem     1024-bloc
s   ]
Use] Available Capacity Mounted on
/dev/sda1         19222656  2651148  15594972      15% /
udev               3041408        4   3041404       1% /dev
tmpfs               610092      444    609648       1% /run
none                  5120        0      5120       0% /run/lock
none               3050460        0   3050460       0% /run/shm
/dev/sda7        
 19210]6    35864   1787588       2% /tmp
/dev/sda8         14417392   360668  13324360       3% /home
/dev/sda9         49770220  2574396  44667620       6% /srv
/dev/sda6         48060296  5784076  39834852      13% /var
/dev/sda10         4914816   179952   4472720       4% /var/lib/mysql
/dev/sda11       1
531996] 11656580 116445760      10% /var/lib/hobbit

**** False Positive Message ****
maven01-vb.hrz.tu-darmstadt.de:disk red [774507]
red Sat Oct 15 09:46:22 CEST 2016 - Filesystems NOT ok
&red 1% /run (406100% used) has reached the PANIC level (95%)

Filesystem                                             1024-blocks    Used 
Available Capacity Mounted on
udev                                                         10240       0     
10240       0% /dev
t
pfs   ]                                                   406356     256    
406100       1% /run
/dev/disk/by-uuid/298ee340-256f-4430-bba1-a14a475728c1    19222656 4254772  
13991348      24% /
tmpfs                                                         5120       0      
5120   
   0% /r]n/lock
tmpfs                                                      1398620       0   
1398620       0% /run/shm
/dev/sda1                                                   350275   19677    
311910       6% /boot
/dev/sda5                                                  8484528  220312   
7833216       3% /home
/dev/sdb1                                                 31391836 6749152  
23069824      23% /mnt/vol0

Since allowing lager client-messages the issues are gone. The oversize message 
came from the machine reporting one or two client messages before. As far a I 
could reproduce the client message from the machine in between was completely 
ignored if the cause was two before.

Kind regards.
	Lars

-- 
man-da.de GmbH, AS8365                          Phone: +49 6151 16-71027
Mornewegstraße 30                               Fax: +49 6151 16-71198
D-64293 Darmstadt                               e-mail: lk at man-da.de
Geschäftsführer Marcus Stögbauer                AG Darmstadt, HRB 94 84