[Xymon] how to learn what is crashing my rrd handler?

John Thurston john.thurston at alaska.gov
Wed Aug 27 21:10:16 CEST 2014


On 8/27/2014 9:27 AM, John Thurston wrote:
> On 8/26/2014 11:16 AM, J.C. Cleaver wrote:
>> On Tue, August 26, 2014 11:04 am, John Thurston wrote:
>>> I'm having difficulty with my RRD handlers crashing and leaving gaps in
>>> my databases.
>>>
>>> I mentioned this back in April, 2014 but received no responses:
>>> http://lists.xymon.com/pipermail/xymon/2014-April/039547.html
> - snip -
>>> I _suspect_ it is another client sending me empty messages, but how do I
>>> find it now that I have several hundred clients sending "data" messages?
>>> --
>
>> As an initial step, run xymond_rrd in --debug mode... You can send the
>> pid
>> a -USR2 signal to toggle this setting without bouncing the process itself
>> (be sure you're signalling xymond_rrd and not its xymond_channel parent).
>
> Ahh. I figured out what I was doing wrong. I was placing --debug in the
> wrong place on the line. When I put it at the end, I get debug output
> from the 'data' handler rather than the parent.
>
> When I do so, the log contains:
>
>> 2014-08-27 09:02:07 Peer not up, flushing message queue
>> 2014-08-27 09:02:08 Peer not up, flushing message queue
>> 2014-08-27 09:02:09 Peer not up, flushing message queue
>> 2014-08-27 09:02:10 Peer not up, flushing message queue
>> 2014-08-27 09:02:10 Peer not up, flushing message queue
>> 2014-08-27 09:02:16 Peer not up, flushing message queue
>> 2014-08-27 09:02:17 Peer not up, flushing message queue
>> 4595 2014-08-27 09:02:19 Opening file
>> /opt/xymon/server/etc/rrddefinitions.cfg
>> 4595 2014-08-27 09:02:19 Want msg 1, startpos 0, fillpos 0, endpos -1,
>> usedbytes=0, bufleft=528383
>> 4595 2014-08-27 09:02:19 Got 230 bytes
>> 4595 2014-08-27 09:02:19 xymond_rrd: Got message 2103
>> @@data#2103/soapsgdc02.soa.alaska.gov|1409158937.159587|10.210.36.22||soapsgdc02.soa.alaska.gov|trends||ETS/MsgDir
>>
- snip -
>> 4595 2014-08-27 09:02:19    Exp.len : 3
>> 4595 2014-08-27 09:02:19    Exp.ofs : 0
>> 4595 2014-08-27 09:02:19    Flags   : 1
>> 4595 2014-08-27 09:02:19    Port    : 22
>> 4595 2014-08-27 09:02:19  Name      : telnet
>> 4595 2014-08-27 09:02:19 2014-08-27 09:02:21 Child process 4595 died:
>> Signal 6
>> 2014-08-27 09:02:24 Peer at 0.0.0.0:0 failed: Broken pipe
>> 2014-08-27 09:02:24 Peer not up, flushing message queue
>> 2014-08-27 09:02:24 Peer not up, flushing message queue
>
> and the stack from the core file (using pstack)
>>  fee5ebd4 _lwp_kill (6, 0, 0, fee3e0f0, ffffffff, 6) + 8
>>  fedd29f0 abort    (0, 1, 6666c, ffb04, feed5518, 0) + 110
>>  0003bb94 sigsegv_handler (b, 0, ffbfb588, 1, 0, 544a8) + 30
>>  fee5b00c __sighndlr (b, 0, ffbfb588, 3bb64, 0, 1) + c
>>  fee4f6bc call_user_handler (b, 0, 0, 0, fed32a00, ffbfb588) + 3b8
>>  fee4f8a4 sigacthandler (b, 0, ffbfb588, 20, 0, 0) + 60
>>  --- called from signal handler with signal 11 (SIGSEGV) ---
>>  fedc2d50 strlen   (53e37, ffbfc804, ffbfbdf9, 0, 0, 0) + 50
>>  fee319d4 vfprintf (71990, 53e28, ffbfc800, 0, a0afc, fee314d4) + ec
>>  0002f24c dbgprintf (53e28, 0, e9768, 6f800, 71990, 6d000) + a0
>>  0003347c dump_tcp_services (53e88, 53ea0, 53eb8, c0, a0, 67f98) + a0
>>  00033d70 init_tcp_services (168a78, 620, 67f98, 54060, 600, 168430) +
>> 848
>>  0002f858 rrd_setup (98906, 6d000, 6d000, 80808080, 6d000, 0) + 164
>>  0002fc4c find_xymon_rrd (988f4, 492e8, 53fe08c6, 53fe08c6, 988c2, 2e)
>> + 4
>>  00048cb0 main     (98907, ffbfdba4, 988fc, 68800, 3, 49528) + 728
>>  00015d2c _start   (0, 0, 0, 0, 0, 0) + 5c
>
> Which, if I'm reading it correctly, makes me think the application tried
> to read off the end of a string.

And I think the thing which ran off the end of the string was the debug 
process :p I'm still looking at source (and my C is very, very bad), but 
I suspect the debug print process is choking on the absence of an 
attribute in the procols.cfg

When I turn debug off, my rrd handlers are working much better.
-- 
    Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska



More information about the Xymon mailing list