[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xymon 4.2.3 rrd-data.log shows xstrdup: Cannot dup NULL string



Here I am with some new data, because the problem still exists. I know
that the rrd-data-daemon crashes with the "xstrdup: Cannot dup NULL
string" error. I have setup netapp.pl with $Hobbit_fd_lib::debug = 2;
and fount that the systat output is different; don't know if it is the
real cause of the crash...?!

orwell:/usr/lib/hobbit/server/ext # cat
/var/lib/hobbit/tmp/netapp.sysstat.DEBUG.camelot
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write
age   hit time  ty util                 in   out    in   out
 29%     0  7976     0    7976  3147  5098   3872   3104     0     0
  3   96%  12%  T    8%      0     0     0     0     0     0

orwell:/usr/lib/hobbit/server/ext # cat
/var/lib/hobbit/tmp/netapp.sysstat.DEBUG.noah
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write
age   hit time  ty util                 in   out
  8%     0     0     0       0     1     6    986   1988     0     0
>60  100%  13%  T    9%      0     0     0     0

The other files (/var/lib/hobbit/tmp/netapp.xtstats.DEBUG.camelot)
also show a change of output. The current beginning was previously the
ending of the output file. So now it begins with :

system:system:nfs_ops:3190/s
system:system:cifs_ops:0/s
system:system:http_ops:0/s
system:system:fcp_ops:0/s
system:system:iscsi_ops:0/s
system:system:read_ops:619/s
system:system:write_ops:144/s
system:system:net_data_recv:4187KB/s
system:system:net_data_sent:23328KB/s
system:system:disk_data_read:5493KB/s
system:system:disk_data_written:6156KB/s
system:system:cpu_busy:10%
system:system:avg_processor_busy:10%
system:system:total_processor_busy:20%
system:system:num_processors:2
system:system:time:1244021254s
system:system:uptime:1048085s
disk:2000001D:38B5ED6F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:total_transfers:8/s
disk:2000001D:38B5ED6F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:user_read_chain:3.60

Were from our pre-7.3.1.1 filers, the output is:

.....
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_read_latency:0us
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_read_blocks:0/s
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_write_latency:0us
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_write_blocks:0/s
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:0%
system:system:nfs_ops:0/s
system:system:cifs_ops:0/s
system:system:http_ops:0/s
system:system:dafs_ops:0/s
system:system:fcp_ops:0/s
system:system:iscsi_ops:0/s
system:system:net_data_recv:13KB/s
system:system:net_data_sent:47KB/s
system:system:disk_data_read:986KB/s
system:system:disk_data_written:1988KB/s
system:system:cpu_busy:8%
system:system:avg_processor_busy:5%
system:system:total_processor_busy:10%
system:system:num_processors:2
system:system:time:1244021255s
system:system:uptime:7436873s



2009/5/30 Peter Welter <peter.welter (at) gmail.com>:
> Addendum:
> Turning off 'netapp.pl' to all filers and selectively turning it on
> again, it appears that there are no problems with On Tap 7.2.3 and
> 7.2.4. The error does not show up and all trending (also for other
> data-dependant trending) show no holes anymore.
>
> But these 7.3.1.1-filers are very important, so I have to turn the
> monitoring on again on this NetApp-cluster. Will see if debugging the
> perl script will give more relevant data.
>
> 2009/5/29 Peter Welter <peter.welter (at) gmail.com>:
>> Hello all,
>>
>> Last friday may 22 at 8:20 we finished our upgrade from our
>> NetApp-filers (version 7.2.3 to 7.3.1.1). These filers were (and are)
>> monitored by Xymon in combination with the perl-netapp-client.
>> Combined a great combo!
>>
>> However, since the upgrade I keep getting this error in
>> /var/log/hobbit/rrd-data.log:
>> ...
>> 2009-05-22 08:22:00 xstrdup: Cannot dup NULL string
>> 2009-05-22 08:22:00 Worker process died with exit code 6, terminating
>> ....
>>
>> This error appears every 5 minutes.
>>
>> Only one graph-type is not trended anymore since the upgrade, the
>> xtstatscolumn which deliver all statistics about each drive in the
>> filer. (About +/- 20 graphs). Sometimes, it does trend some data but
>> that is for a very short time, let's say 5 or 15 minutes. Then for
>> hours, nothing.
>>
>> One filer has not been upgraded, but shows the same lack of trending.
>> But that can be caused because I have set it up with MultiThreading
>> (something that can be set using a parameter).
>>
>> Now I will change this to 1 (for each filer a seperate process) to see
>> if the problem can be narrowed, so I'll update this problem later on
>> this weekend.
>>
>> Regards,
>> Peter
>>
>> PS I do not know if this has to do with either Xymon of netapp.pl, but
>> since it is integrated into the Xymon-source (hobbitd/rrd/do_netapp.c)
>> I think it should be posted here.
>>
>