[Xymon] rrd logs and graphs

Vernon Everett everett.vernon at gmail.com
Sat Mar 21 00:16:38 CET 2015


Hi Jeremy

That thought occurred to me, but I checked.
There is only one [power] entry in the graphs.cfg file.
And I put it there for this particular test.

Would have made this one too easy if it was that. :-)

Regards
Vernon


On 20 March 2015 at 16:43, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:

> Vernon
>
> The power status page must refer to a different graph name in graphs.cfg
> with a different FNPATTERN.
>
> Click on the graphs images for each version to get the 4-graph view and
> compare the URLs.
>
> J
>
> On Fri, 20 Mar 2015 19:35 Vernon Everett <everett.vernon at gmail.com> wrote:
>
>> Hi all
>>
>> I was only back at the client today, and unfortunately have not managed
>> to get that patch in yet.
>> (As I mentioned before, it's a production system)
>>
>> However, I did notice something really odd.
>> I have focused my attention on the trends graphs, where I get all the
>> extra values, but it's not happening in the test itself, despite the
>> existence of the additional rrd files.
>>
>> Example.
>> I have something that plots the power usage of the PSUs on a NetApp
>> e-series.
>> There are 4 PSUs, output looks like this.
>>
>> Total power drawn- 487 Watts
>> Number of trays- 2
>> Tray power input details-
>>
>>    TRAY ID  POWER SUPPLY SERIAL NUMBER   INPUT POWER
>>    99       0                            145 Watts
>>    99       1                            151 Watts
>>    0        0                            99 Watts
>>    0        1                            92 Watts
>>
>> All good. And I have a graph with 4 lines. Min, Max, Curr and Avg values
>> are all there. It looks beautiful.
>> But go look at the power graph in trends, and it's ugly.
>> Heaps of additional data lines with no entries. All values are NaN
>> And mixed in amongst the additional empty graphs, are the 4 valid lines.
>>
>> I look at the rrd files, and they are all there, even the bad ones.
>> Here's a few of them.
>> power,tcpListenDrop.rrd
>> power,tcpOutAck.rrd
>> power,tcpOutDataSegs.rrd
>> power,tcpOutRsts.rrd
>> power,tcpOutUrg.rrd
>> power,tcpOutWinProbe.rrd
>> power,tcpRetransSegs.rrd
>> power,tcpRtoMax.rrd
>> power,tcpRttUpdate.rrd
>> power,tcpTimKeepaliveProbe.rrd
>> power,tcpTimRetransDrop.rrd
>> power,Tray0_PSU0.rrd                  <--- Valid
>> power,Tray0_PSU1.rrd                  <--- Valid
>> power,Tray99_PSU0.rrd                 <--- Valid
>> power,Tray99_PSU1.rrd                 <--- Valid
>> power,trlogpool.rrd
>> power,UDP_udpInDatagrams.rrd
>> power,udpInCksumErrs.rrd
>> power,udpOutDatagrams.rrd
>> power,vnet.rrd
>>
>> So I thought I would check my configs.
>> In xymonserver
>> From TEST2RRD= ,power=ncv,
>> From GRAPHS=  ,power::9,
>> And further down
>> SPLITNCV_power="*:GAUGE"
>>
>> And in graphs.cfg
>> [power]
>>     FNPATTERN power,(.*).rrd
>>     TITLE Database Power Consumption Per Tray PSU
>>     YAXIS Watts
>>     -l 0
>>     DEF:p at RRDIDX@=@RRDFN@:lambda:AVERAGE
>>     LINE2:p at RRDIDX@#@COLOR@:@RRDPARAM@
>>     GPRINT:p at RRDIDX@:LAST: \: %5.1lf (cur)
>>     GPRINT:p at RRDIDX@:MAX: \: %5.1lf (max)
>>     GPRINT:p at RRDIDX@:MIN: \: %5.1lf (min)
>>     GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
>>
>> With luck I will get approval to recompile with the debugging bug-fix,
>> and we can get more info, but I thought the extra entries in trends, but
>> not in the test was interesting.
>>
>> Regards
>> Vernon
>>
>>
>>
>>
>>
>>
>>
>>
>> On 13 March 2015 at 15:24, J.C. Cleaver <cleaver at terabithia.org> wrote:
>>
>>> On Wed, March 11, 2015 5:51 pm, Jeremy Laidman wrote:
>>> > On 11 March 2015 at 14:18, Vernon Everett <everett.vernon at gmail.com>
>>> > wrote:
>>> >
>>> >> About now, I am getting a little nervous adding send and expect,
>>> because
>>> >> unlike telnet and telnets, we are doing ldap and ldaps testing.
>>> >>
>>> >
>>> > That's understandable.  A read through the code suggests that at least
>>> in
>>> > some places, an empty string is equivalent to an undefined string, as
>>> the
>>> > string length (shown in Sendlen in the debug output) is zero in both
>>> > cases.  So until a patch is in place, a work-around might be to define
>>> > empty "send" and "expect" strings for those that have none.
>>> >
>>> > Any suggestions?
>>> >> I think we have some debug code update recommendations for JC though.
>>> >> :-)
>>> >>
>>> >
>>> >  Here's my patch.  I'll push this into the dev list for proposed
>>> inclusion
>>> > in a future release.
>>> >
>>> > --- lib/netservices.c.orig      2012-07-25 01:48:41.000000000 +1000
>>> > +++ lib/netservices.c   2015-03-12 11:18:18.000000000 +1100
>>> > @@ -328,9 +328,9 @@
>>> >         dbgprintf("Service list dump\n");
>>> >         for (i=0; (svcinfo[i].svcname); i++) {
>>> >                 dbgprintf(" Name      : %s\n", svcinfo[i].svcname);
>>> > -               dbgprintf("   Sendtext: %s\n",
>>> binview(svcinfo[i].sendtxt,
>>> > svcinfo[i].sendlen));
>>> > +               dbgprintf("   Sendtext: %s\n",
>>> > svcinfo[i].sendtxt!=NULL?binview(svcinfo[i].sendtxt,
>>> > svcinfo[i].sendlen):"[null]");
>>> >                 dbgprintf("   Sendlen : %d\n", svcinfo[i].sendlen);
>>> > -               dbgprintf("   Exp.text: %s\n",
>>> binview(svcinfo[i].exptext,
>>> > svcinfo[i].explen));
>>> > +               dbgprintf("   Exp.text: %s\n",
>>> > svcinfo[i].exptext!=NULL?binview(svcinfo[i].exptext,
>>> > svcinfo[i].explen):"[null]");
>>> >                 dbgprintf("   Exp.len : %d\n", svcinfo[i].explen);
>>> >                 dbgprintf("   Exp.ofs : %d\n", svcinfo[i].expofs);
>>> >                 dbgprintf("   Flags   : %d\n", svcinfo[i].flags);
>>> >
>>> > This produces "[null]" where we would have seen "(null)" on a GNU-based
>>> > OS,
>>> > to differentiate between the two situations.
>>> >
>>> > In the mean time, you could compile a special version of xymond_rrd,
>>> and
>>> > run it manually on the same data channel as the real one, but have it
>>> make
>>> > RRD files and log file to a different location.  This shouldn't
>>> interfere
>>> > with your production Xymon.  Here's one I prepared earlier that works
>>> for
>>> > me:
>>> >
>>> > sudo -u xymon mkdir /tmp/my-rrd-data/
>>> > sudo -u xymon xymoncmd /bin/sh -c 'XYMONTMP=/tmp;
>>> > /usr/lib/xymon/server/bin/xymond_channel --channel=data
>>> > --log=/tmp/my-rrd-data.log /path/to/xymond_rrd_debug_patch
>>> > --rrddir=/tmp/my-rrd-data/ --debug'
>>> >
>>> > This seems to show some really useful stuff that's relevant to solving
>>> > your
>>> > problem.  Some sample debug lines:
>>> >
>>> > 15306 2015-03-12 11:36:28 xymond_rrd_debug_patch: Got message 165619
>>> >
>>> @@data#165619/servername|1426120588.401891|172.16.0.1||servername|vmstat|sunos|ABC
>>> > ...
>>> > 15306 2015-03-12 11:36:28 Creating rrd
>>> > /tmp/my-rrd-data//servername/vmstat.rrd
>>> > 15306 2015-03-12 11:36:28 RRD create param 00: 'rrdcreate'
>>> > 15306 2015-03-12 11:36:28 RRD create param 01:
>>> > '/tmp/my-rrd-data//servername/vmstat.rrd'
>>> > 15306 2015-03-12 11:36:28 RRD create param 02: '-s'
>>> > 15306 2015-03-12 11:36:28 RRD create param 03: '300'
>>> > 15306 2015-03-12 11:36:28 RRD create param 04: 'DS:cpu_r:GAUGE:600:0:U'
>>> > 15306 2015-03-12 11:36:28 RRD create param 05: 'DS:cpu_b:GAUGE:600:0:U'
>>> > 15306 2015-03-12 11:36:28 RRD create param 06: 'DS:cpu_w:GAUGE:600:0:U'
>>> > ...
>>> > 15306 2015-03-12 11:39:42 Got 265 bytes
>>> > 15306 2015-03-12 11:39:42 xymond_rrd_debug_patch: Got message 165737
>>> >
>>> @@data#165737/servername|1426120782.080244|172.16.0.2||servername|trends||DEF
>>> > 15306 2015-03-12 11:39:42 startpos 216644, fillpos 216644, endpos -1
>>> > 15306 2015-03-12 11:39:42 Flushing
>>> > '/servername/tcp.xopiy90404.parameter.rrd' with 1 updates pending,
>>> > template
>>> > 'sec'
>>> > 15306 2015-03-12 11:39:42 Want msg 165738, startpos 216644, fillpos
>>> > 216644,
>>> > endpos -1, usedbytes=0, bufleft=1884603
>>> >
>>> > J
>>> >
>>>
>>>
>>> This is some excellent sleuthing! :)
>>>
>>> As I was pouring through the thread (sorry, I've been out the last few
>>> days), I failed to take note of the SPARC-Enterprise-T2000 in the output.
>>>
>>>
>>> The patch below should fix the immediate issue triggered by debug mode...
>>> letting us move on to the larger oddness. Unfortunately, I have a feeling
>>> there are other occasions where we're relying on GNU's printf(NULL)
>>> printing that out and thus might be caught by this. As I find them, I go
>>> ahead and work to put fixes in.
>>>
>>> In the meantime, this will be in 4.3.19 and can be patched directly from
>>> below.
>>>
>>>
>>> HTH,
>>>
>>> -jc
>>>
>>>
>>> --- lib/netservices.c   (revision 7598)
>>> +++ lib/netservices.c   (working copy)
>>> @@ -81,9 +81,9 @@
>>>         unsigned char *inp, *outp;
>>>         int i;
>>>
>>> -       if (!buf) return NULL;
>>> +       if (result) xfree(result);
>>> +       if (!buf) { result = strdup("[null]"); return result; }
>>>
>>> -       if (result) xfree(result);
>>>         if (buf && (buflen == 0)) buflen = strlen(buf);
>>>         result = (char *)malloc(4*buflen + 1);  /* Worst case: All
>>> binary */
>>>
>> _______________________________________________
>>> Xymon mailing list
>>> Xymon at xymon.com
>>> http://lists.xymon.com/mailman/listinfo/xymon
>>>
>>
>> --
>> "Accept the challenges so that you can feel the exhilaration of victory"
>> - General George Patton
>>
>


-- 
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20150321/ef6d2f21/attachment.html>


More information about the Xymon mailing list