[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [hobbit] hobbitd_rrd is not looking good



Henrik Stoerner wrote:

On Fri, Nov 11, 2005 at 03:58:20AM -0500, Bob Ababurko wrote:


I have no real idea what I broke, but maybe someone can tell me. I made a change in my bb-services file as I was trying to define a different smtp service that expected a different return value than default smtp. I called it something atypical and things started to turn red and when I clicked on the red faces, there were Internal Server Error messages.


Changing bb-services should not break things like that, so I'm pretty sure this is a coincidence. Or at least - you're not to blame for hobbitd crashing :-)



About that same time, the hobbitd and hobbitd_rrd turned red. I immediately changed the values back to where they where(removed the additional smtp definition as well as removed the the reference in bb-hosts) and then restarted hobbit. The hobbitd_rrd does not seem to be coming back.

So right now, hobbitd_rrd is purple and when you click to get more information, it says 'Program Crashed Fatal Signal Caught'



If it is purple, it is safe to remove it with the command bb 127.0.0.1 "drop HOBBIT.SERVER.HOSTNAME hobbitd_rrd"

The reason it ends up being purple is because normally hobbitd_rrd will
not generate any status column. The only time it does is when it
crashes; it's a kind of "Mayday" signal to make sure you notice that
something bad has happened, and alert me to this.

You can always check the "ps" listing and see if there are any hobbitd_rrd
processes running - a standard install will have two of them, plus two
hobbitd_channel processes feeding them.

henrik (at) osiris:~$ ps -u hobbit
 PID TTY          TIME CMD
 10756 ?        00:00:00 hobbitlaunch
 10757 ?        00:02:00 hobbitd
 10762 ?        00:00:07 hobbitd_channel
 10763 ?        00:00:01 hobbitd_filestore
 10764 ?        00:00:00 hobbitd_channel
 10765 ?        00:00:01 hobbitd_channel
 10776 ?        00:00:00 hobbitd_alert
 10778 ?        00:00:00 hobbitd_history
 11581 ?        00:00:07 hobbitd_channel
 11582 ?        00:00:05 hobbitd_rrd
 11583 ?        00:00:01 hobbitd_channel
 11699 ?        00:00:01 hobbitd_channel
 11700 ?        00:00:05 hobbitd_client
 11584 ?        00:00:02 hobbitd_rrd
 21402 ?        00:00:00 sh
 21403 ?        00:00:00 vmstat



rrd-data.log at around the time that this happened there is an entry that says, '2005-11-11 01:33:35 Worker process died with exit code 134, terminating'. I am not sure if that is related. I do not seem to have any COREFILES in my tmp dir unless they may have been erased when I restarted hobbit....probably not, but I don't know.



They are not erased automatically, so they ought to be there ... could
you run a find ~hobbit -name "core*"
and see if anything shows up ?



Regards, Henrik


To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe (at) hswn.dk






Ok, maybe I have gotten mixed in the name of expected corefile. I DO have a file that is called hobbitd_rrd.core. Now, it looks like it was created at the time of 'the incident'....so I thin what I am looking for. I was actually looking for something called COREFILE. I must have misread. Ok, now I cannot seem to find the web page that showed what to do to review a corefile in tmp. Does anyone know what I should be doing to to read these files.

Now, is taking hobbitd_rrd out of the monitoring checks what I want? Dont I want/need it in there for a complete hobbit.....fixed, of course? I want my hobbit to be a healthy and fully funtional hobbit. I guess I am curious what went wrong and how to fix it. Maybe this has something to do with the COREFILE.....which I need to fugure out how to read so I can figure out why it crashed. Am I right here? Sounds logical.

I do have two hobbitd_rrd processes running, but I only checked for two after removing the hobbitd_rrd from being checked. I actually do not remember seeing two last night, but it is dsitinctly possible.

-Bob