[Xymon] autofixing

Larry Barber lebarber at gmail.com
Fri Apr 6 23:31:21 CEST 2012


Resending to the list, Gmail seems to be hiding the "reply to all".

Thanks,
Larry Barber

On Fri, Apr 6, 2012 at 4:28 PM, Larry Barber <lebarber at gmail.com> wrote:

> The kind of things that you can automate should be handled routinely, not
> be triggered by an alert from your monitoring tool. If you have logs
> growing to fast that they are filling up you file system you should find
> out what is filling them up and why and then fix that. Automatic log
> rotation and compression should be done by a tool like logrotate, not Xymon
> or any other monitoring tool. You shouldn't be using a monitoring tool to
> trigger routine maintenance, it simply causes unnecessary alerts that cause
> problems in other areas.
>
> Thanks,
> Larry Barber
>
>
> On Fri, Apr 6, 2012 at 4:06 PM, KING, KEVIN <KK1051 at att.com> wrote:
>
>>  Larry,****
>>
>> ** **
>>
>> Some auto correcting is not bad.  Back in the Big brother days I had a
>> datacenter and team of folks. We managed to the “yellow” alerts. I had
>> folks correct and build scripts to address the things that brought on the
>> yellow so we never saw the red.  This made it so very little red was ever
>> seen.****
>>
>> ** **
>>
>> Now the things you can automate are the disk full kind of things. If that
>> happens you can run a script to clean logs compress and that stuff.  This
>> was usually handled by managing the yellow. There would be a script in
>> place to keep the space to below the yellow trigger. So if you got a red it
>> was usually a bug temp file or something that would get cleaned shortly. So
>> say on the red alert you could have it run the cleanup script rather than
>> waiting for your cron to do the normal cleanup.****
>>
>> ** **
>>
>> Now on other issues it really depends on what the alert is about. You
>> cannot automate everything economically. At some point it is cheaper and
>> faster to put a human in the loop. I did have a script that would take the
>> e-mail response from the alert and we could have it parse the message and
>> do the work. This was back in the day with the RIM pagers. So you got an
>> alert you replied to the alert with “run clean script on host” The reply
>> e-mail was parsed in by the same script we were using to acknowledge the
>> alert. It would parse and run a clean script. This let my admins be able to
>> work issues while away from a PC or network connection.****
>>
>> ** **
>>
>> I do hear and agree with your concerns. A blanket statement from managers
>> that do not have a full understanding of all the elements is a ruff thing
>> to swallow. But there heart is in the right spot J ****
>>
>> ** **
>>
>> I guess in a rather long rambling way I am saying that you learn and tune
>> your systems. Address re-occurring issues so they do not. Then watch for
>> the next thing to be addressed.****
>>
>> ** **
>>
>> ** **
>>
>> -Kevin****
>>
>> ** **
>>
>> ** **
>>
>> *From:* xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] *On
>> Behalf Of *Larry Barber
>> *Sent:* Friday, April 06, 2012 1:43 PM
>> *To:* xymon at xymon.com
>> *Subject:* [Xymon] autofixing****
>>
>> ** **
>>
>> My management has gotten the idea that we should be automating the repair
>> processes on our servers. They want things set up so that when a fault is
>> detected a script is run that attempts to repair it. I've tried to convince
>> them that this is a profoundly wrong-headed idea, but I'm not having much
>> luck. Do any of you know of any articles or resources that might help
>> convince them?
>>
>> Thanks,
>> Larry Barber****
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20120406/18f53600/attachment.html>


More information about the Xymon mailing list