[Xymon] white gaps in graphs across a number of services

Vincent Baines vincent.baines at excelian.com
Wed Jun 20 13:29:22 CEST 2012


Well, still getting these issues despite tidying alot of errors away.. had quite a few misses last night. Selection of error messages I get include:
alot of these
2012-06-20 11:13:17 xymond_rrd: Got message 460528, expected 460520
2012-06-20 11:14:22 xymond_rrd: Got message 460720, expected 460712
2012-06-20 11:15:41 xymond_rrd: Got message 461145, expected 461133
2012-06-20 11:18:15 xymond_rrd: Got message 462593, expected 462584
2012-06-20 11:18:19 Peer at 0.0.0.0:0 failed: Broken pipe
27089 2012-06-20 11:18:19 Semaphore wait aborted: Interrupted system call
2012-06-20 11:18:19 Peer not up, flushing message queue
27089 2012-06-20 11:18:19 Connecting to peer 0.0.0.0:0
27089 2012-06-20 11:18:19 Peer is UP
2012-06-20 11:18:19 Unknown token 'MEMSTAT' ignored at line 385

at the time of some gaps I get these:
2012-06-20 02:00:57 xymond_rrd: Got message 242464, expected 242463
2012-06-20 02:01:06 Flushed 12 stale messages for 0.0.0.0:0
2012-06-20 02:01:07 Flushed 4 stale messages for 0.0.0.0:0
2012-06-20 02:01:08 xymond_rrd: Got message 242493, expected 242476
2012-06-20 02:01:09 Flushed 5 stale messages for 0.0.0.0:0
2012-06-20 02:01:10 xymond_rrd: Got message 242512, expected 242507
2012-06-20 02:01:36 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:37 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:38 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 xymond_rrd: Got message 242703, expected 242663
2012-06-20 02:01:40 xymond_rrd: Got message 242799, expected 242797
2012-06-20 02:01:52 xymond_rrd: Got message 242855, expected 242846
2012-06-20 02:01:53 xymond_rrd: Got message 242874, expected 242866
(and even more in rrd-data.log


and quite a few of these:
2012-06-20 10:46:57 RRD error updating /xymon/data/rrd/hostname1/allext.rrd from 172.30.166.218: /xymon/data/rrd/hostname1/allext.rrd: found extra data on update argument: 46:+2:0.28:80:91.5:64:13:00:04:00:00:00:23:20:00:25:45:29:21:30:44:03:00:54:41:59:42:09:29:51:11:01:50:39:52:59

I'm guessing the latter might be the cause of why I see random RRD files created - there's some strange characters in there. But, I've added an echo to the custom script to log what it sends to xymon, so far the output of that is what I'd expect. Is there some sort of corruption possible - two updates at exactly the same time corrupting somehow?! 

Anything suggestions?

Thanks!
________________________________________
From: cleaver at terabithia.org [cleaver at terabithia.org]
Sent: 18 June 2012 20:47
To: Vincent Baines
Cc: Xymon Email List
Subject: RE: [Xymon] white gaps in graphs across a number of services

No problem.. It can be confusing with long process chains like this :)

In tasks.cfg, in [xymond] put it straight after the xymond in the CMD
line. In [rrdstatus] and [rrddata], put it immediately after the
"xymond_rrd" (not xymond_channel).


-jc



> Sorry.. hopefully not a stupid question, but where should I put the
> --debug flag? I've done this before where I think I've enabled debug, but
> haven't and become happy because there were no debug errors!
>
> The logs are a bit messy at the moment, I'm trying to get rid of some of
> the errors, the main culprits are too many data sources for the RRD files,
> which I can't really explain as they work sometimes, and some cases of the
> message relating to 'expected message number XXX and received message
> number XXY' - sometimes just one or two but sometimes alot in one go.
> ________________________________________
> From: cleaver at terabithia.org [cleaver at terabithia.org]
> Sent: 18 June 2012 19:29
> To: Vincent Baines
> Cc: xymon at xymon.com
> Subject: Re: [Xymon] white gaps in graphs across a number of services
>
> Do you see anything unusual in the xymond_rrd or xymond log(s) around that
> time? If messages are dropping to zero, it could definitely be a crash
> somewhere.
>
> If nothing interesting shows up, try running both with --debug enabled as
> well... We might get a better idea of why that's happening.
>
> Regards,
>
> -jc
>
>
>> Hi Everyone,
>>
>>
>>
>> Have been looking on and off at a problem I've seen for a while now,
>> without massive success. I see intermittant 'white gaps' occuring in
>> xymon
>> results across a number of services, and sometimes at corresponding
>> times,
>> but sometimes not. Most frequently I see this gap for CPU load, and this
>> isn't just specific to one server.
>>
>> Attached is an example of useres and processes from one client server.
>> There is a corresponding gap for the approx 3AM gap in CPU utilization
>> graphs, memory graphs, actually, all of them I think, and a large
>> 300second spike in clock offset at that time. But, nothing corresponding
>> to the other gaps.
>>
>>
>>
>> If I look at the xymon server itself, it looks like there was something
>> up
>> at that time too, as xymond incoming messages drops to zero. But, for
>> the
>> rest of the day,  it holds at a steady number. But, theres are gaps all
>> over the place in xymonnet runtime, CPU utilization, users and procs,
>> etc.
>>
>>
>>
>> I seem to recall we did try to tweak some rrd cache value as it cropped
>> up
>> in another post, which I think improved things slightly. But, we are
>> having problems with the platforms that we're trying to monitor, with
>> apparent long NFS pings between boxes.
>>
>>
>>
>> The xymon server itself is running on a VM box. Has anyone had issues
>> running on VM?
>>
>>
>>
>> As best I can figure, either we have a xymon config issue, the xymon box
>> itself isn't stable and it dropping data, or we have genuine network /
>> disk write issues..
>>
>>
>>
>> Any other thoughts?
>>
>>
>>
>> Cheers!
>>
>> The information contained in this email and any attached files is
>> confidential and intended solely for the addressee(s). The email may be
>> legally privileged or prohibited from disclosure and unauthorised use.
>> If
>> you are not the named addressee you may not use, copy, or disclose this
>> information to any other person. If you received this message in error
>> please notify the sender immediately and delete it from your system.
>>
>> Any opinion or views contained in this email message are those of the
>> sender, and do not represent those of the Company in any way and
>> reliance
>> should not be placed upon its contents. Unless otherwise stated, this
>> email message is not intended to be contractually binding. Where an
>> Agreement exists between our respective companies and there is conflict
>> between the contents of this email message and the Agreement then the
>> terms of that Agreement shall prevail.
>>
>> Excelian Limited
>> 44 Featherstone Street
>> London
>> EC1Y 8RN
>> Tel: +44 (0) 20 7336 9595
>> www.Excelian.com
>> _____________________________________________________________________
>> This e-mail has been scanned for viruses by MessageLabs. For further
>> information visit http://www.messagelabs.com
>>
>> Excelian subscribes to cleaner and greener methods of working. Help take
>> responsibility for the environment. Please don't print this email unless
>> you absolutely have to._______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
>>
>
>
>
> The information contained in this email and any attached files is
> confidential and intended solely for the addressee(s). The email may be
> legally privileged or prohibited from disclosure and unauthorised use. If
> you are not the named addressee you may not use, copy, or disclose this
> information to any other person. If you received this message in error
> please notify the sender immediately and delete it from your system.
>
> Any opinion or views contained in this email message are those of the
> sender, and do not represent those of the Company in any way and reliance
> should not be placed upon its contents. Unless otherwise stated, this
> email message is not intended to be contractually binding. Where an
> Agreement exists between our respective companies and there is conflict
> between the contents of this email message and the Agreement then the
> terms of that Agreement shall prevail.
>
> Excelian Limited
> 44 Featherstone Street
> London
> EC1Y 8RN
> Tel: +44 (0) 20 7336 9595
> www.Excelian.com
> _____________________________________________________________________
> This e-mail has been scanned for viruses by MessageLabs. For further
> information visit http://www.messagelabs.com
>
> Excelian subscribes to cleaner and greener methods of working. Help take
> responsibility for the environment. Please don't print this email unless
> you absolutely have to.
>

The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
44 Featherstone Street
London
EC1Y 8RN
Tel: +44 (0) 20 7336 9595
www.Excelian.com
_____________________________________________________________________
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.



More information about the Xymon mailing list