[Xymon] autofixing

Larry Barber lebarber at gmail.com
Fri Apr 6 23:33:54 CEST 2012


I've tried looking at Google, but can't seem to come up with a good search
phrase. What I mainly get is articles about various tools that will auto
repair various Microsoft products. This is the kind of thing that happens
once you start on the auto repair bandwagon, but some software, then buy
some more software to keep the first program running, than but some more to
keep the second running then ....

Thanks,
Larry Barber

On Fri, Apr 6, 2012 at 4:31 PM, Larry Barber <lebarber at gmail.com> wrote:

> Resending to the list, Gmail seems to be hiding the "reply to all".
>
> Thanks,
> Larry Barber
>
>
> On Fri, Apr 6, 2012 at 4:28 PM, Larry Barber <lebarber at gmail.com> wrote:
>
>> The kind of things that you can automate should be handled routinely, not
>> be triggered by an alert from your monitoring tool. If you have logs
>> growing to fast that they are filling up you file system you should find
>> out what is filling them up and why and then fix that. Automatic log
>> rotation and compression should be done by a tool like logrotate, not Xymon
>> or any other monitoring tool. You shouldn't be using a monitoring tool to
>> trigger routine maintenance, it simply causes unnecessary alerts that cause
>> problems in other areas.
>>
>> Thanks,
>> Larry Barber
>>
>>
>> On Fri, Apr 6, 2012 at 4:06 PM, KING, KEVIN <KK1051 at att.com> wrote:
>>
>>>  Larry,****
>>>
>>> ** **
>>>
>>> Some auto correcting is not bad.  Back in the Big brother days I had a
>>> datacenter and team of folks. We managed to the “yellow” alerts. I had
>>> folks correct and build scripts to address the things that brought on the
>>> yellow so we never saw the red.  This made it so very little red was ever
>>> seen.****
>>>
>>> ** **
>>>
>>> Now the things you can automate are the disk full kind of things. If
>>> that happens you can run a script to clean logs compress and that stuff.
>>>  This was usually handled by managing the yellow. There would be a script
>>> in place to keep the space to below the yellow trigger. So if you got a red
>>> it was usually a bug temp file or something that would get cleaned shortly.
>>> So say on the red alert you could have it run the cleanup script rather
>>> than waiting for your cron to do the normal cleanup.****
>>>
>>> ** **
>>>
>>> Now on other issues it really depends on what the alert is about. You
>>> cannot automate everything economically. At some point it is cheaper and
>>> faster to put a human in the loop. I did have a script that would take the
>>> e-mail response from the alert and we could have it parse the message and
>>> do the work. This was back in the day with the RIM pagers. So you got an
>>> alert you replied to the alert with “run clean script on host” The reply
>>> e-mail was parsed in by the same script we were using to acknowledge the
>>> alert. It would parse and run a clean script. This let my admins be able to
>>> work issues while away from a PC or network connection.****
>>>
>>> ** **
>>>
>>> I do hear and agree with your concerns. A blanket statement from
>>> managers that do not have a full understanding of all the elements is a
>>> ruff thing to swallow. But there heart is in the right spot J ****
>>>
>>> ** **
>>>
>>> I guess in a rather long rambling way I am saying that you learn and
>>> tune your systems. Address re-occurring issues so they do not. Then watch
>>> for the next thing to be addressed.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> -Kevin****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] *On
>>> Behalf Of *Larry Barber
>>> *Sent:* Friday, April 06, 2012 1:43 PM
>>> *To:* xymon at xymon.com
>>> *Subject:* [Xymon] autofixing****
>>>
>>> ** **
>>>
>>> My management has gotten the idea that we should be automating the
>>> repair processes on our servers. They want things set up so that when a
>>> fault is detected a script is run that attempts to repair it. I've tried to
>>> convince them that this is a profoundly wrong-headed idea, but I'm not
>>> having much luck. Do any of you know of any articles or resources that
>>> might help convince them?
>>>
>>> Thanks,
>>> Larry Barber****
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20120406/f37136b4/attachment.html>


More information about the Xymon mailing list