[Xymon] how to learn what is crashing my rrd handler?
John Thurston
john.thurston at alaska.gov
Wed Aug 27 21:10:16 CEST 2014
On 8/27/2014 9:27 AM, John Thurston wrote:
> On 8/26/2014 11:16 AM, J.C. Cleaver wrote:
>> On Tue, August 26, 2014 11:04 am, John Thurston wrote:
>>> I'm having difficulty with my RRD handlers crashing and leaving gaps in
>>> my databases.
>>>
>>> I mentioned this back in April, 2014 but received no responses:
>>> http://lists.xymon.com/pipermail/xymon/2014-April/039547.html
> - snip -
>>> I _suspect_ it is another client sending me empty messages, but how do I
>>> find it now that I have several hundred clients sending "data" messages?
>>> --
>
>> As an initial step, run xymond_rrd in --debug mode... You can send the
>> pid
>> a -USR2 signal to toggle this setting without bouncing the process itself
>> (be sure you're signalling xymond_rrd and not its xymond_channel parent).
>
> Ahh. I figured out what I was doing wrong. I was placing --debug in the
> wrong place on the line. When I put it at the end, I get debug output
> from the 'data' handler rather than the parent.
>
> When I do so, the log contains:
>
>> 2014-08-27 09:02:07 Peer not up, flushing message queue
>> 2014-08-27 09:02:08 Peer not up, flushing message queue
>> 2014-08-27 09:02:09 Peer not up, flushing message queue
>> 2014-08-27 09:02:10 Peer not up, flushing message queue
>> 2014-08-27 09:02:10 Peer not up, flushing message queue
>> 2014-08-27 09:02:16 Peer not up, flushing message queue
>> 2014-08-27 09:02:17 Peer not up, flushing message queue
>> 4595 2014-08-27 09:02:19 Opening file
>> /opt/xymon/server/etc/rrddefinitions.cfg
>> 4595 2014-08-27 09:02:19 Want msg 1, startpos 0, fillpos 0, endpos -1,
>> usedbytes=0, bufleft=528383
>> 4595 2014-08-27 09:02:19 Got 230 bytes
>> 4595 2014-08-27 09:02:19 xymond_rrd: Got message 2103
>> @@data#2103/soapsgdc02.soa.alaska.gov|1409158937.159587|10.210.36.22||soapsgdc02.soa.alaska.gov|trends||ETS/MsgDir
>>
- snip -
>> 4595 2014-08-27 09:02:19 Exp.len : 3
>> 4595 2014-08-27 09:02:19 Exp.ofs : 0
>> 4595 2014-08-27 09:02:19 Flags : 1
>> 4595 2014-08-27 09:02:19 Port : 22
>> 4595 2014-08-27 09:02:19 Name : telnet
>> 4595 2014-08-27 09:02:19 2014-08-27 09:02:21 Child process 4595 died:
>> Signal 6
>> 2014-08-27 09:02:24 Peer at 0.0.0.0:0 failed: Broken pipe
>> 2014-08-27 09:02:24 Peer not up, flushing message queue
>> 2014-08-27 09:02:24 Peer not up, flushing message queue
>
> and the stack from the core file (using pstack)
>> fee5ebd4 _lwp_kill (6, 0, 0, fee3e0f0, ffffffff, 6) + 8
>> fedd29f0 abort (0, 1, 6666c, ffb04, feed5518, 0) + 110
>> 0003bb94 sigsegv_handler (b, 0, ffbfb588, 1, 0, 544a8) + 30
>> fee5b00c __sighndlr (b, 0, ffbfb588, 3bb64, 0, 1) + c
>> fee4f6bc call_user_handler (b, 0, 0, 0, fed32a00, ffbfb588) + 3b8
>> fee4f8a4 sigacthandler (b, 0, ffbfb588, 20, 0, 0) + 60
>> --- called from signal handler with signal 11 (SIGSEGV) ---
>> fedc2d50 strlen (53e37, ffbfc804, ffbfbdf9, 0, 0, 0) + 50
>> fee319d4 vfprintf (71990, 53e28, ffbfc800, 0, a0afc, fee314d4) + ec
>> 0002f24c dbgprintf (53e28, 0, e9768, 6f800, 71990, 6d000) + a0
>> 0003347c dump_tcp_services (53e88, 53ea0, 53eb8, c0, a0, 67f98) + a0
>> 00033d70 init_tcp_services (168a78, 620, 67f98, 54060, 600, 168430) +
>> 848
>> 0002f858 rrd_setup (98906, 6d000, 6d000, 80808080, 6d000, 0) + 164
>> 0002fc4c find_xymon_rrd (988f4, 492e8, 53fe08c6, 53fe08c6, 988c2, 2e)
>> + 4
>> 00048cb0 main (98907, ffbfdba4, 988fc, 68800, 3, 49528) + 728
>> 00015d2c _start (0, 0, 0, 0, 0, 0) + 5c
>
> Which, if I'm reading it correctly, makes me think the application tried
> to read off the end of a string.
And I think the thing which ran off the end of the string was the debug
process :p I'm still looking at source (and my C is very, very bad), but
I suspect the debug print process is choking on the absence of an
attribute in the procols.cfg
When I turn debug off, my rrd handlers are working much better.
--
Do things because you should, not just because you can.
John Thurston 907-465-8591
John.Thurston at alaska.gov
Enterprise Technology Services
Department of Administration
State of Alaska
More information about the Xymon
mailing list