<div>I never received any alerts about messages being truncated. After disabling the non prod clients i started receiving alerts about the messages being truncated. I adjusted these values as specified below and they are good now. Tomorrow i'll enable the non prod servers again and see if this is what the original culprit was. Thanks!</div>

<div><br></div><div><br><br><div class="gmail_quote">On Mon, Jan 18, 2010 at 12:41 PM, Williams, Doug (Consultant-RIC) <span dir="ltr"><<a href="mailto:Doug.Williams@rhd.com">Doug.Williams@rhd.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Seems to me your clients data is being truncated.  Try modifying this in<br>

your hobbitserver.cfg.  You may want to set them appropriate size for<br>

your xymon server.  I have xymon running on pretty beefy servers so I<br>

set these incredibly high, and even though they may exceed what xymon<br>

actually allows (but it is not hurting me).  Restart hobbit server after<br>

making change to hobbitserver.cfg<br>

<br>

<br>

<br>

MAXMSG_STATUS=30000000<br>

MAXMSG_CLIENT=30000000<br>

MAXMSG_DATA=30000000<br>

<div class="im"><br>

<br>

-----Original Message-----<br>

From: Chris Naude [mailto:<a href="mailto:chris.naude.0@gmail.com">chris.naude.0@gmail.com</a>]<br>

Sent: Monday, January 18, 2010 2:21 PM<br>

To: <a href="mailto:hobbit@hswn.dk">hobbit@hswn.dk</a><br>

Subject: Re: [hobbit] False Process Down Alerts<br>

<br>

I've managed to stop the flood of false alerts. I removed all of my<br>

non-prod clients from the bb-hosts and shut off their client processes.<br>

The problem seems to be somehow related to the amount of data the Xymon<br>

server is trying to process.<br>

<br>

<br>

On Sun, Jan 17, 2010 at 5:08 PM, Chris Naude <<a href="mailto:chris.naude.0@gmail.com">chris.naude.0@gmail.com</a>><br>

wrote:<br>

<br>

<br>

        I have 7 clients running. Each client has a different name. They<br>

are all sending data to the primary Xymon server. The alerts are reading<br>

missing processes, full file systems, and msgs errors. Here is another<br>

sample of an unusual error. You can see the process list has a funky<br>

break in it.<br>

<br>

<br>

         Sun Jan 17 15:40:18 MST 2010 - Processes NOT ok<br>

<br>

</div>         yellow<<a href="http://unixadmin.bestwestern.com/xymon/gifs/yellow.gif" target="_blank">http://unixadmin.bestwestern.com/xymon/gifs/yellow.gif</a>><br>

<div><div></div><div class="h5">Expected string COMMAND not found in ps output header<br>

<br>

          PID  PPID USER<br>

          STIM] S PRI  %CPU     TIME     VSZ COMMAND<br>

            0     0 root      Dec 14  S 127  0.16 00:40:00       0<br>

swapper<br>

            1     0 root      Dec 14  R 152  0.09 00:01:21    2064 init<br>

           48     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           45     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           42     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           31     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           30     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           29     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           28     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           26     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

            5     0 root      Dec 14  R 152  0.00 00:00:02       0<br>

signald<br>

            6     0 root      Dec 14  R 152  0.00 00:00:03       0<br>

kmemdaemon<br>

           17     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           16     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           15     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           14     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           13     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           12     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

usbhubd<br>

           11     0 root      Dec 14  R 152  0.00 00:01:11       0<br>

escsid<br>

           10     0 root      Dec 14  S -32  0.00 00:00:00       0 ttisr<br>

            9     0 root      Dec 14  R 152  0.00 00:01:27       0<br>

ksyncer_daemon<br>

<br>

        7     0]root      Dec 14  R 152<br>

         0.00 00:]0:00       0 kai_daemon<br>

           50     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           47     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           44     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

           41     0 root      Dec 14  S 152  0.00 00:00:00       0<br>

net_str_cached<br>

<br>

        On Sun, Jan 17, 2010 at 4:21 PM, Josh Luthman<br>

<<a href="mailto:josh@imaginenetworksllc.com">josh@imaginenetworksllc.com</a>> wrote:<br>

<br>

<br>

                Is there only one client sending data as this name?  I<br>

don't think you answered Lars' email.<br>

<br>

                What does the alert read and what does the data say?<br>

Missing process?  Too high of a load?<br>

<br>

                Josh Luthman<br>

                Office: 937-552-2340<br>

                Direct: 937-552-2343<br>

                1100 Wayne St<br>

                Suite 1337<br>

                Troy, OH 45373<br>

<br>

                "The secret to creativity is knowing how to hide your<br>

sources."<br>

                --- Albert Einstein<br>

<br>

<br>

<br>

                On Sun, Jan 17, 2010 at 6:11 PM, Chris Naude<br>

<<a href="mailto:chris.naude.0@gmail.com">chris.naude.0@gmail.com</a>> wrote:<br>

<br>

<br>

                        The problem has suddenly become much much worse.<br>

I verified with tcpdump that the data coming from the client is 100%<br>

correct. It seems something on the Xymon server side is not handling the<br>

client data correctly. Anyone have any other ideas?<br>

<br>

</div></div><div><div></div><div class="h5">                        red 89%     /testdb3 (37771472% used) has<br>

reached the PANIC level (95%)<br>

<br>

                        Filesystem            1024-blocks  Used<br>

Available Capacity Mounted on<br>

                        /dev/vgtestdb1/lvol1    107844344 70901816<br>

36942528    66%     /testdb1<br>

                        /dev/vgtestdb2/lvol1    35962064 25453128<br>

10508936    71%     /testdb2<br>

                        /dev/vgtestdb4/lvol1    970909400 825006344<br>

145903056    85%     /testdb4<br>

                        /dev/vgtestdb3/lv<br>

                        l1 ]  338788224 301016752 37771472    89%<br>

/testdb3<br>

                        /dev/vgtestdb5/lvol1    179789048 150553912<br>

29235136    84%     /testdb5<br>

                        /dev/vg00/lvol8       24580711    74501 24506210<br>

1%     /home<br>

                        /dev/vg00/lvol4       10226680  6339283  3887397<br>

62%     /opt<br>

<br>

<br>

                        On Sat, Jan 16, 2010 at 10:44 AM, Chris Naude<br>

<<a href="mailto:chris.naude.0@gmail.com">chris.naude.0@gmail.com</a>> wrote:<br>

<br>

<br>

                                That makes a lot of sense. I did have<br>

some issues with the startup scripts on HP-UX. I'll check it out later<br>

tonight. Hopefully i can get it fixed before it goes live tonight.<br>

Thanks!<br>

<br>

<br>

                                On Sat, Jan 16, 2010 at 7:56 AM, Lars<br>

Ebeling <<a href="mailto:lars.ebeling@leopg9.no-ip.org">lars.ebeling@leopg9.no-ip.org</a>> wrote:<br>

<br>

<br>

                                        It looks like two instances of<br>

the client are writing to the file at the same time or almost ;)<br>

<br>

<br>

                                        Lars<br>

<br>

                                                ----- Original Message<br>

-----<br>

                                                From: Chris Naude<br>

</div></div><mailto:<a href="mailto:chris.naude.0@gmail.com">chris.naude.0@gmail.com</a>><br>

<div><div></div><div class="h5">                                                To: <a href="mailto:hobbit@hswn.dk">hobbit@hswn.dk</a><br>

                                                Sent: Saturday, January<br>

16, 2010 4:59 AM<br>

                                                Subject: [hobbit] False<br>

Process Down Alerts<br>

<br>

                                                I'm run into a strange<br>

problem with my Xymon server. I noticed today that I'm receiving random<br>

false alerts for processes being down. When I look at the process list<br>

output in the alert it looks as if the data coming from the clients<br>

isn't correct. Here is an example. Has anyone seen anything like this?<br>

<br>

                                                 9613  1944 root<br>

Jan 11  S 154  0.00 00:00:00    6128 cmclconfd -c<br>

                                                10389  1944 root<br>

Jan 11  S 154  0.00 00:00:00    6128 cmclconfd -c<br>

                                                 9794     1 oracle<br>

10:55:57 S 154  0.00 00:00:0<br>

                                                  217600]oracleTEST<br>

(LOCAL=NO)<br>

                                                 1592     1 oracle<br>

Jan 11  S 154  0.00 00:00:11  217136 ora_mman_TEST<br>

                                                12751  1944 root<br>

Jan 11  S 154  0.00 00:00:00    6128 cmclconfd -c<br>

                                                 8965  1944 root<br>

Jan 11  S 154  0.00 00:00:00    6128 cmclconfd -c<br>

<br>

                                                11819     1 oracle<br>

Jan 12  S 154  0.00 00:00:07  217280 ora_j015_TEST<br>

                                                 2711     1 roo<br>

                                                      ]ec  4  S 120<br>

0.04 00:02:16     868 /usr/sbin/xntpd<br>

                                                 3547     1 xymon<br>

Dec  4  S 168  0.00 00:00:43     268 /opt/xymon/client/bin/hobbitlaunch<br>

--config=/opt/xymon/client/etc/clientlaunch.cfg<br>

--log=/opt/xymon/client/logs/clientlaunch.log<br>

--pidfile=/opt/xymon/client/logs/clientlaunch.101.example.com.pid<br>

                                                 3728     1 root<br>

Dec  4  R 152  0.00 00:00:37    4208<br>

/usr/sbin/stm/uut/bin/tools/monitor/WbemWrapperMonitor<br>

<br>

<br>

                                                Xymon version:<br>

4.3.0-0.beta2<br>

                                                Xymon server: CentOS 5.4<br>

32 bit<br>

<br>

                                                Client: HP-UX 11.31<br>

Itanium<br>

<br>

                                                --<br>

                                                Chris Naude<br>

<br>

<br>

<br>

<br>

<br>

                                --<br>

                                Chris Naude<br>

<br>

<br>

<br>

<br>

<br>

                        --<br>

                        Chris Naude<br>

<br>

<br>

<br>

<br>

<br>

<br>

        --<br>

        Chris Naude<br>

<br>

<br>

<br>

<br>

<br>

--<br>

Chris Naude<br>

<br>

<br>

</div></div>To unsubscribe from the hobbit list, send an e-mail to<br>

<a href="mailto:hobbit-unsubscribe@hswn.dk">hobbit-unsubscribe@hswn.dk</a><br>

<br>

<br>

</blockquote></div><br><br clear="all"><br>-- <br>Chris Naude<br>

</div>