[Xymon] rrd logs and graphs
Vernon Everett
everett.vernon at gmail.com
Sun Mar 22 01:19:35 CET 2015
I was hoping to get approval to patch the production system, but change
control at $CLIENT is sclerotic on a good day.
Your suggestion is probably the way I will do it.
Back there on Wednesday. Watch this space.
Cheers
Vernon
On 21 March 2015 at 22:40, Galen Johnson <solitaryr at gmail.com> wrote:
> Stupid question but can't you just set up a test Xymon server and have the
> client system point to both servers? That way you can update the test
> instance and do whatever you need to tweak it without impacting prod.
>
> On Sat, Mar 21, 2015 at 6:31 AM, Vernon Everett <everett.vernon at gmail.com>
> wrote:
>
>> Very confused now.
>> In the test graph, showing the history graphs, the URL contains
>> service=ncv:power
>> And in the history graphs in the status, it's this.
>> service=power
>>
>> That doesn't confuse me as much as what the graphs look like now.
>> Both the test and the trends graph now contain the spurious values.
>> Yesterday, they were only in the trends graph.
>>
>> A need to get that debug output fixed.
>>
>> Regards
>> Vernon
>>
>>
>>
>> On 21 March 2015 at 17:48, Jeremy Laidman <jlaidman at rebel-it.com.au>
>> wrote:
>>
>>> So the URLs are different? But both have service=power in the URLs?
>>>
>>> On Sat, 21 Mar 2015 10:16 Vernon Everett <everett.vernon at gmail.com>
>>> wrote:
>>>
>>>> Hi Jeremy
>>>>
>>>> That thought occurred to me, but I checked.
>>>> There is only one [power] entry in the graphs.cfg file.
>>>> And I put it there for this particular test.
>>>>
>>>> Would have made this one too easy if it was that. :-)
>>>>
>>>> Regards
>>>> Vernon
>>>>
>>>>
>>>> On 20 March 2015 at 16:43, Jeremy Laidman <jlaidman at rebel-it.com.au>
>>>> wrote:
>>>>
>>>>> Vernon
>>>>>
>>>>> The power status page must refer to a different graph name in
>>>>> graphs.cfg with a different FNPATTERN.
>>>>>
>>>>> Click on the graphs images for each version to get the 4-graph view
>>>>> and compare the URLs.
>>>>>
>>>>> J
>>>>>
>>>>> On Fri, 20 Mar 2015 19:35 Vernon Everett <everett.vernon at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all
>>>>>>
>>>>>> I was only back at the client today, and unfortunately have not
>>>>>> managed to get that patch in yet.
>>>>>> (As I mentioned before, it's a production system)
>>>>>>
>>>>>> However, I did notice something really odd.
>>>>>> I have focused my attention on the trends graphs, where I get all the
>>>>>> extra values, but it's not happening in the test itself, despite the
>>>>>> existence of the additional rrd files.
>>>>>>
>>>>>> Example.
>>>>>> I have something that plots the power usage of the PSUs on a NetApp
>>>>>> e-series.
>>>>>> There are 4 PSUs, output looks like this.
>>>>>>
>>>>>> Total power drawn- 487 Watts
>>>>>> Number of trays- 2
>>>>>> Tray power input details-
>>>>>>
>>>>>> TRAY ID POWER SUPPLY SERIAL NUMBER INPUT POWER
>>>>>> 99 0 145 Watts
>>>>>> 99 1 151 Watts
>>>>>> 0 0 99 Watts
>>>>>> 0 1 92 Watts
>>>>>>
>>>>>> All good. And I have a graph with 4 lines. Min, Max, Curr and Avg
>>>>>> values are all there. It looks beautiful.
>>>>>> But go look at the power graph in trends, and it's ugly.
>>>>>> Heaps of additional data lines with no entries. All values are NaN
>>>>>> And mixed in amongst the additional empty graphs, are the 4 valid
>>>>>> lines.
>>>>>>
>>>>>> I look at the rrd files, and they are all there, even the bad ones.
>>>>>> Here's a few of them.
>>>>>> power,tcpListenDrop.rrd
>>>>>> power,tcpOutAck.rrd
>>>>>> power,tcpOutDataSegs.rrd
>>>>>> power,tcpOutRsts.rrd
>>>>>> power,tcpOutUrg.rrd
>>>>>> power,tcpOutWinProbe.rrd
>>>>>> power,tcpRetransSegs.rrd
>>>>>> power,tcpRtoMax.rrd
>>>>>> power,tcpRttUpdate.rrd
>>>>>> power,tcpTimKeepaliveProbe.rrd
>>>>>> power,tcpTimRetransDrop.rrd
>>>>>> power,Tray0_PSU0.rrd <--- Valid
>>>>>> power,Tray0_PSU1.rrd <--- Valid
>>>>>> power,Tray99_PSU0.rrd <--- Valid
>>>>>> power,Tray99_PSU1.rrd <--- Valid
>>>>>> power,trlogpool.rrd
>>>>>> power,UDP_udpInDatagrams.rrd
>>>>>> power,udpInCksumErrs.rrd
>>>>>> power,udpOutDatagrams.rrd
>>>>>> power,vnet.rrd
>>>>>>
>>>>>> So I thought I would check my configs.
>>>>>> In xymonserver
>>>>>> From TEST2RRD= ,power=ncv,
>>>>>> From GRAPHS= ,power::9,
>>>>>> And further down
>>>>>> SPLITNCV_power="*:GAUGE"
>>>>>>
>>>>>> And in graphs.cfg
>>>>>> [power]
>>>>>> FNPATTERN power,(.*).rrd
>>>>>> TITLE Database Power Consumption Per Tray PSU
>>>>>> YAXIS Watts
>>>>>> -l 0
>>>>>> DEF:p at RRDIDX@=@RRDFN@:lambda:AVERAGE
>>>>>> LINE2:p at RRDIDX@#@COLOR@:@RRDPARAM@
>>>>>> GPRINT:p at RRDIDX@:LAST: \: %5.1lf (cur)
>>>>>> GPRINT:p at RRDIDX@:MAX: \: %5.1lf (max)
>>>>>> GPRINT:p at RRDIDX@:MIN: \: %5.1lf (min)
>>>>>> GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
>>>>>>
>>>>>> With luck I will get approval to recompile with the debugging
>>>>>> bug-fix, and we can get more info, but I thought the extra entries in
>>>>>> trends, but not in the test was interesting.
>>>>>>
>>>>>> Regards
>>>>>> Vernon
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 13 March 2015 at 15:24, J.C. Cleaver <cleaver at terabithia.org>
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, March 11, 2015 5:51 pm, Jeremy Laidman wrote:
>>>>>>> > On 11 March 2015 at 14:18, Vernon Everett <
>>>>>>> everett.vernon at gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >> About now, I am getting a little nervous adding send and expect,
>>>>>>> because
>>>>>>> >> unlike telnet and telnets, we are doing ldap and ldaps testing.
>>>>>>> >>
>>>>>>> >
>>>>>>> > That's understandable. A read through the code suggests that at
>>>>>>> least in
>>>>>>> > some places, an empty string is equivalent to an undefined string,
>>>>>>> as the
>>>>>>> > string length (shown in Sendlen in the debug output) is zero in
>>>>>>> both
>>>>>>> > cases. So until a patch is in place, a work-around might be to
>>>>>>> define
>>>>>>> > empty "send" and "expect" strings for those that have none.
>>>>>>> >
>>>>>>> > Any suggestions?
>>>>>>> >> I think we have some debug code update recommendations for JC
>>>>>>> though.
>>>>>>> >> :-)
>>>>>>> >>
>>>>>>> >
>>>>>>> > Here's my patch. I'll push this into the dev list for proposed
>>>>>>> inclusion
>>>>>>> > in a future release.
>>>>>>> >
>>>>>>> > --- lib/netservices.c.orig 2012-07-25 01:48:41.000000000 +1000
>>>>>>> > +++ lib/netservices.c 2015-03-12 11:18:18.000000000 +1100
>>>>>>> > @@ -328,9 +328,9 @@
>>>>>>> > dbgprintf("Service list dump\n");
>>>>>>> > for (i=0; (svcinfo[i].svcname); i++) {
>>>>>>> > dbgprintf(" Name : %s\n", svcinfo[i].svcname);
>>>>>>> > - dbgprintf(" Sendtext: %s\n",
>>>>>>> binview(svcinfo[i].sendtxt,
>>>>>>> > svcinfo[i].sendlen));
>>>>>>> > + dbgprintf(" Sendtext: %s\n",
>>>>>>> > svcinfo[i].sendtxt!=NULL?binview(svcinfo[i].sendtxt,
>>>>>>> > svcinfo[i].sendlen):"[null]");
>>>>>>> > dbgprintf(" Sendlen : %d\n", svcinfo[i].sendlen);
>>>>>>> > - dbgprintf(" Exp.text: %s\n",
>>>>>>> binview(svcinfo[i].exptext,
>>>>>>> > svcinfo[i].explen));
>>>>>>> > + dbgprintf(" Exp.text: %s\n",
>>>>>>> > svcinfo[i].exptext!=NULL?binview(svcinfo[i].exptext,
>>>>>>> > svcinfo[i].explen):"[null]");
>>>>>>> > dbgprintf(" Exp.len : %d\n", svcinfo[i].explen);
>>>>>>> > dbgprintf(" Exp.ofs : %d\n", svcinfo[i].expofs);
>>>>>>> > dbgprintf(" Flags : %d\n", svcinfo[i].flags);
>>>>>>> >
>>>>>>> > This produces "[null]" where we would have seen "(null)" on a
>>>>>>> GNU-based
>>>>>>> > OS,
>>>>>>> > to differentiate between the two situations.
>>>>>>> >
>>>>>>> > In the mean time, you could compile a special version of
>>>>>>> xymond_rrd, and
>>>>>>> > run it manually on the same data channel as the real one, but have
>>>>>>> it make
>>>>>>> > RRD files and log file to a different location. This shouldn't
>>>>>>> interfere
>>>>>>> > with your production Xymon. Here's one I prepared earlier that
>>>>>>> works for
>>>>>>> > me:
>>>>>>> >
>>>>>>> > sudo -u xymon mkdir /tmp/my-rrd-data/
>>>>>>> > sudo -u xymon xymoncmd /bin/sh -c 'XYMONTMP=/tmp;
>>>>>>> > /usr/lib/xymon/server/bin/xymond_channel --channel=data
>>>>>>> > --log=/tmp/my-rrd-data.log /path/to/xymond_rrd_debug_patch
>>>>>>> > --rrddir=/tmp/my-rrd-data/ --debug'
>>>>>>> >
>>>>>>> > This seems to show some really useful stuff that's relevant to
>>>>>>> solving
>>>>>>> > your
>>>>>>> > problem. Some sample debug lines:
>>>>>>> >
>>>>>>> > 15306 2015-03-12 11:36:28 xymond_rrd_debug_patch: Got message
>>>>>>> 165619
>>>>>>> >
>>>>>>> @@data#165619/servername|1426120588.401891|172.16.0.1||servername|vmstat|sunos|ABC
>>>>>>> > ...
>>>>>>> > 15306 2015-03-12 11:36:28 Creating rrd
>>>>>>> > /tmp/my-rrd-data//servername/vmstat.rrd
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 00: 'rrdcreate'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 01:
>>>>>>> > '/tmp/my-rrd-data//servername/vmstat.rrd'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 02: '-s'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 03: '300'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 04:
>>>>>>> 'DS:cpu_r:GAUGE:600:0:U'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 05:
>>>>>>> 'DS:cpu_b:GAUGE:600:0:U'
>>>>>>> > 15306 2015-03-12 11:36:28 RRD create param 06:
>>>>>>> 'DS:cpu_w:GAUGE:600:0:U'
>>>>>>> > ...
>>>>>>> > 15306 2015-03-12 11:39:42 Got 265 bytes
>>>>>>> > 15306 2015-03-12 11:39:42 xymond_rrd_debug_patch: Got message
>>>>>>> 165737
>>>>>>> >
>>>>>>> @@data#165737/servername|1426120782.080244|172.16.0.2||servername|trends||DEF
>>>>>>> > 15306 2015-03-12 11:39:42 startpos 216644, fillpos 216644, endpos
>>>>>>> -1
>>>>>>> > 15306 2015-03-12 11:39:42 Flushing
>>>>>>> > '/servername/tcp.xopiy90404.parameter.rrd' with 1 updates pending,
>>>>>>> > template
>>>>>>> > 'sec'
>>>>>>> > 15306 2015-03-12 11:39:42 Want msg 165738, startpos 216644, fillpos
>>>>>>> > 216644,
>>>>>>> > endpos -1, usedbytes=0, bufleft=1884603
>>>>>>> >
>>>>>>> > J
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> This is some excellent sleuthing! :)
>>>>>>>
>>>>>>> As I was pouring through the thread (sorry, I've been out the last
>>>>>>> few
>>>>>>> days), I failed to take note of the SPARC-Enterprise-T2000 in the
>>>>>>> output.
>>>>>>>
>>>>>>>
>>>>>>> The patch below should fix the immediate issue triggered by debug
>>>>>>> mode...
>>>>>>> letting us move on to the larger oddness. Unfortunately, I have a
>>>>>>> feeling
>>>>>>> there are other occasions where we're relying on GNU's printf(NULL)
>>>>>>> printing that out and thus might be caught by this. As I find them,
>>>>>>> I go
>>>>>>> ahead and work to put fixes in.
>>>>>>>
>>>>>>> In the meantime, this will be in 4.3.19 and can be patched directly
>>>>>>> from
>>>>>>> below.
>>>>>>>
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> -jc
>>>>>>>
>>>>>>>
>>>>>>> --- lib/netservices.c (revision 7598)
>>>>>>> +++ lib/netservices.c (working copy)
>>>>>>> @@ -81,9 +81,9 @@
>>>>>>> unsigned char *inp, *outp;
>>>>>>> int i;
>>>>>>>
>>>>>>> - if (!buf) return NULL;
>>>>>>> + if (result) xfree(result);
>>>>>>> + if (!buf) { result = strdup("[null]"); return result; }
>>>>>>>
>>>>>>> - if (result) xfree(result);
>>>>>>> if (buf && (buflen == 0)) buflen = strlen(buf);
>>>>>>> result = (char *)malloc(4*buflen + 1); /* Worst case: All
>>>>>>> binary */
>>>>>>>
>>>>>> _______________________________________________
>>>>>>> Xymon mailing list
>>>>>>> Xymon at xymon.com
>>>>>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> "Accept the challenges so that you can feel the exhilaration of
>>>>>> victory"
>>>>>> - General George Patton
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "Accept the challenges so that you can feel the exhilaration of victory"
>>>> - General George Patton
>>>>
>>>
>>
>>
>> --
>> "Accept the challenges so that you can feel the exhilaration of victory"
>> - General George Patton
>>
>> _______________________________________________
>> Xymon mailing list
>> Xymon at xymon.com
>> http://lists.xymon.com/mailman/listinfo/xymon
>>
>>
>
--
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150322/ae8b1ee1/attachment.html>
More information about the Xymon
mailing list