[hobbit] hobbitd status-board not available [SOLVED!]

David Gore David.Gore at mci.com
Thu Oct 13 21:37:31 CEST 2005


I am not sure, if I missed this before I don't think I did, but it's possible.  

Regardless the problem has been resolved.

hobbitlaunch.log:2005-10-13 19:01:57 Could not get sem: No space left on device

solaris 9:

/etc/system:

set shmsys:shminfo_shmseg=10

# reboot # or init 6

Everything works well including multi-host enable/disables.  No cores since making the change.

Thank you Henrik for all your hard work!


~David

*e-mail via SUSE Linux 9.3 and other open source tools.



David Gore wrote:
> David Gore wrote:
>>
>> Henrik Stoerner wrote:
>>> On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
>>>  
>>>> What does this message mean.  Typically we get this when disabling 
>>>> multiple hosts.  Is it a host resource issue, something isn't 
>>>> replying quick enough?  We are on the snapshot from 03 October.  
>>>> This has been happening over many weeks and different snapshots.  
>>>> OS is solaris 9.
>>>>     
>>>
>>> It really points to a bug in the hobbitd daemon - it means that some
>>> task (usually bbdisplay) couldn't fetch the status information from
>>> the Hobbit server, which it uses to build the webpages.
>>>
>>> I'm somewhat alarmed if you have this problem with such a recent 
>>> snapshot. I know there was a bug in 4.1.1 (and earlier) that could 
>>> trigger this when disabling or renaming hosts, but that should not
>>> happen with the snapshot from 03 Oct.
>>>
>>>  
>>>> I am pretty sure these happen as people disable hosts and it fails 
>>>> although bb2.html shows them going to blue in the history, they 
>>>> will not show up on the enable/disable screen and usually show as 
>>>> failed when executing the disable.
>>>>     
>>>
>>> Interesting. I'll go over that particular piece of code again to
>>> see if I can come up with an explanation. If you have a way of
>>> triggering this, let me know - in that case, I'd like you to try out
>>> some things to make it sure it is fixed.
>>>
>>>
>>> Regards,
>>> Henrik
>>>
>>>
>>> To unsubscribe from the hobbit list, send an e-mail to
>>> hobbit-unsubscribe at hswn.dk
>>>
>>>   
>> It is still happening with the latest 4.1.2 install.  A multi-host 
>> (~75+ hosts) disable worked, but then later on the enable it looks 
>> like hobbitd crashed:
>>
>> hobbit at hobbit:/export/home/hobbit/server> find . -name core
>> ./tmp/core
>> hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core
>> -rw-------   1 hobbit   other    13630500 Oct 11 16:46 ./tmp/core
>> hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core
>> ./tmp/core:     ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
>> hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core
>> GNU gdb 6.0
>> Copyright 2003 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "sparc-sun-solaris2.9"...
>> Core was generated by `hobbitd 
>> --pidfile=/export/home/hobbit/server/logs/hobbitd.pid 
>> --restart=/export'.
>> Program terminated with signal 6, Aborted.
>> Reading symbols from /usr/lib/libresolv.so.2...done.
>> Loaded symbols for /usr/lib/libresolv.so.2
>> Reading symbols from /usr/lib/libsocket.so.1...done.
>> Loaded symbols for /usr/lib/libsocket.so.1
>> Reading symbols from /usr/lib/libnsl.so.1...done.
>> Loaded symbols for /usr/lib/libnsl.so.1
>> Reading symbols from /usr/lib/libc.so.1...done.
>> Loaded symbols for /usr/lib/libc.so.1
>> Reading symbols from /usr/lib/libdl.so.1...done.
>> Loaded symbols for /usr/lib/libdl.so.1
>> Reading symbols from /usr/lib/libmp.so.2...done.
>> Loaded symbols for /usr/lib/libmp.so.2
>> Reading symbols from 
>> /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done.
>> Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1
>> #0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
>> (gdb) bt
>> #0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
>> #1  0xff136cd8 in abort () from /usr/lib/libc.so.1
>> #2  0x00021080 in sigsegv_handler (signum=10) at sig.c:57
>> #3  <signal handler called>
>> (gdb)
>>
>> Can you give me directions on how I can do a relatively clean install 
>> and still retain all my historical information?
>>
>> ~David
>>
>> To unsubscribe from the hobbit list, send an e-mail to
>> hobbit-unsubscribe at hswn.dk
>>
>>
> It has cored several times now due to attempted multi-host 
> re-enables.  I cannot re-enable the hosts.  The last time was 5 hosts 
> with 1 test.  I am just going to let hobbit auto-enable them when 
> their disable time expires.  Additionally, the disable/enable web page 
> is not populated with any hosts for about ten minutes after the crash, 
> that includes the info page.
>
> ~David
>



More information about the Xymon mailing list