[Xymon] how to learn what is crashing my rrd handler?

John Thurston john.thurston at alaska.gov
Wed Aug 27 19:27:23 CEST 2014


On 8/26/2014 11:16 AM, J.C. Cleaver wrote:
> On Tue, August 26, 2014 11:04 am, John Thurston wrote:
>> I'm having difficulty with my RRD handlers crashing and leaving gaps in
>> my databases.
>>
>> I mentioned this back in April, 2014 but received no responses:
>> http://lists.xymon.com/pipermail/xymon/2014-April/039547.html
- snip -
>> I _suspect_ it is another client sending me empty messages, but how do I
>> find it now that I have several hundred clients sending "data" messages?
>> --

> As an initial step, run xymond_rrd in --debug mode... You can send the pid
> a -USR2 signal to toggle this setting without bouncing the process itself
> (be sure you're signalling xymond_rrd and not its xymond_channel parent).

Ahh. I figured out what I was doing wrong. I was placing --debug in the 
wrong place on the line. When I put it at the end, I get debug output 
from the 'data' handler rather than the parent.

When I do so, the log contains:

> 2014-08-27 09:02:07 Peer not up, flushing message queue
> 2014-08-27 09:02:08 Peer not up, flushing message queue
> 2014-08-27 09:02:09 Peer not up, flushing message queue
> 2014-08-27 09:02:10 Peer not up, flushing message queue
> 2014-08-27 09:02:10 Peer not up, flushing message queue
> 2014-08-27 09:02:16 Peer not up, flushing message queue
> 2014-08-27 09:02:17 Peer not up, flushing message queue
> 4595 2014-08-27 09:02:19 Opening file /opt/xymon/server/etc/rrddefinitions.cfg
> 4595 2014-08-27 09:02:19 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383
> 4595 2014-08-27 09:02:19 Got 230 bytes
> 4595 2014-08-27 09:02:19 xymond_rrd: Got message 2103 @@data#2103/soapsgdc02.soa.alaska.gov|1409158937.159587|10.210.36.22||soapsgdc02.soa.alaska.gov|trends||ETS/MsgDir
> 4595 2014-08-27 09:02:19 startpos 230, fillpos 230, endpos -1
> 4595 2014-08-27 09:02:19 Transport setup is:
> 4595 2014-08-27 09:02:19 xymondportnumber = 1984
> 4595 2014-08-27 09:02:19 xymonproxyhost = NONE
> 4595 2014-08-27 09:02:19 xymonproxyport = 0
> 4595 2014-08-27 09:02:19 Recipient listed as '146.63.81.42'
> 4595 2014-08-27 09:02:19 Standard protocol on port 1984
> 4595 2014-08-27 09:02:19 Will connect to address 146.63.81.42 port 1984
> 4595 2014-08-27 09:02:19 Connect status is 0
> 4595 2014-08-27 09:02:19 Sent 16 bytes
> 4595 2014-08-27 09:02:19 Read 8192 bytes
> 4595 2014-08-27 09:02:19 Read 32767 bytes
> 4595 2014-08-27 09:02:19 Read 28440 bytes
> 4595 2014-08-27 09:02:19 Closing connection
> 4595 2014-08-27 09:02:19 Opening file /opt/xymon/server/etc/analysis.cfg
> 4595 2014-08-27 09:02:19 Opening file /opt/xymon/server/etc/protocols.cfg
> 4595 2014-08-27 09:02:19 Opening file /opt/xymon/server/etc/protocols.d/soaprotocols.cfg
> 4595 2014-08-27 09:02:19 Service list dump
> 4595 2014-08-27 09:02:19  Name      : ftp
> 4595 2014-08-27 09:02:19    Sendtext: quit\r\n
> 4595 2014-08-27 09:02:19    Sendlen : 6
> 4595 2014-08-27 09:02:19    Exp.text: 220
> 4595 2014-08-27 09:02:19    Exp.len : 3
> 4595 2014-08-27 09:02:19    Exp.ofs : 0
> 4595 2014-08-27 09:02:19    Flags   : 1
> 4595 2014-08-27 09:02:19    Port    : 21
> 4595 2014-08-27 09:02:19  Name      : ftps
> 4595 2014-08-27 09:02:19    Sendtext: quit\r\n
> 4595 2014-08-27 09:02:19    Sendlen : 6
> 4595 2014-08-27 09:02:19    Exp.text: 220
> 4595 2014-08-27 09:02:19    Exp.len : 3
> 4595 2014-08-27 09:02:19    Exp.ofs : 0
> 4595 2014-08-27 09:02:19    Flags   : 5
> 4595 2014-08-27 09:02:19    Port    : 990
> 4595 2014-08-27 09:02:19  Name      : ssh
> 4595 2014-08-27 09:02:19    Sendtext: SSH-2.0-OpenSSH_4.1\r\n
> 4595 2014-08-27 09:02:19    Sendlen : 21
> 4595 2014-08-27 09:02:19    Exp.text: SSH
> 4595 2014-08-27 09:02:19    Exp.len : 3
> 4595 2014-08-27 09:02:19    Exp.ofs : 0
> 4595 2014-08-27 09:02:19    Flags   : 1
> 4595 2014-08-27 09:02:19    Port    : 22
> 4595 2014-08-27 09:02:19  Name      : ssh1
> 4595 2014-08-27 09:02:19    Sendtext: SSH-2.0-OpenSSH_4.1\r\n
> 4595 2014-08-27 09:02:19    Sendlen : 21
> 4595 2014-08-27 09:02:19    Exp.text: SSH
> 4595 2014-08-27 09:02:19    Exp.len : 3
> 4595 2014-08-27 09:02:19    Exp.ofs : 0
> 4595 2014-08-27 09:02:19    Flags   : 1
> 4595 2014-08-27 09:02:19    Port    : 22
> 4595 2014-08-27 09:02:19  Name      : ssh2
> 4595 2014-08-27 09:02:19    Sendtext: SSH-2.0-OpenSSH_4.1\r\n
> 4595 2014-08-27 09:02:19    Sendlen : 21
> 4595 2014-08-27 09:02:19    Exp.text: SSH
> 4595 2014-08-27 09:02:19    Exp.len : 3
> 4595 2014-08-27 09:02:19    Exp.ofs : 0
> 4595 2014-08-27 09:02:19    Flags   : 1
> 4595 2014-08-27 09:02:19    Port    : 22
> 4595 2014-08-27 09:02:19  Name      : telnet
> 4595 2014-08-27 09:02:19 2014-08-27 09:02:21 Child process 4595 died: Signal 6
> 2014-08-27 09:02:24 Peer at 0.0.0.0:0 failed: Broken pipe
> 2014-08-27 09:02:24 Peer not up, flushing message queue
> 2014-08-27 09:02:24 Peer not up, flushing message queue

and the stack from the core file (using pstack)
>  fee5ebd4 _lwp_kill (6, 0, 0, fee3e0f0, ffffffff, 6) + 8
>  fedd29f0 abort    (0, 1, 6666c, ffb04, feed5518, 0) + 110
>  0003bb94 sigsegv_handler (b, 0, ffbfb588, 1, 0, 544a8) + 30
>  fee5b00c __sighndlr (b, 0, ffbfb588, 3bb64, 0, 1) + c
>  fee4f6bc call_user_handler (b, 0, 0, 0, fed32a00, ffbfb588) + 3b8
>  fee4f8a4 sigacthandler (b, 0, ffbfb588, 20, 0, 0) + 60
>  --- called from signal handler with signal 11 (SIGSEGV) ---
>  fedc2d50 strlen   (53e37, ffbfc804, ffbfbdf9, 0, 0, 0) + 50
>  fee319d4 vfprintf (71990, 53e28, ffbfc800, 0, a0afc, fee314d4) + ec
>  0002f24c dbgprintf (53e28, 0, e9768, 6f800, 71990, 6d000) + a0
>  0003347c dump_tcp_services (53e88, 53ea0, 53eb8, c0, a0, 67f98) + a0
>  00033d70 init_tcp_services (168a78, 620, 67f98, 54060, 600, 168430) + 848
>  0002f858 rrd_setup (98906, 6d000, 6d000, 80808080, 6d000, 0) + 164
>  0002fc4c find_xymon_rrd (988f4, 492e8, 53fe08c6, 53fe08c6, 988c2, 2e) + 4
>  00048cb0 main     (98907, ffbfdba4, 988fc, 68800, 3, 49528) + 728
>  00015d2c _start   (0, 0, 0, 0, 0, 0) + 5c

Which, if I'm reading it correctly, makes me think the application tried 
to read off the end of a string.

-- 
    Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska



More information about the Xymon mailing list