[Xymon] autofixing

White, Bruce bewhite at fellowes.com
Wed Apr 11 19:52:41 CEST 2012


Actually, I have found some cases where an "auto fix" script is helpful
(tools licenses going down, Oracle Listeners that die, etc.), however
they are the exception not the rule.  Also, they need to be coded very
carefully, to make sure they don't keep doing the fix, but the problem
is not solved.  Want to bring down a server, have 2000 none functional
Oracle Listener processes running, doing nothing!

 

    .....Bruce

 

 

 

From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Larry Barber
Sent: Friday, April 06, 2012 4:34 PM
To: xymon at xymon.com
Subject: Re: [Xymon] autofixing

 

I've tried looking at Google, but can't seem to come up with a good
search phrase. What I mainly get is articles about various tools that
will auto repair various Microsoft products. This is the kind of thing
that happens once you start on the auto repair bandwagon, but some
software, then buy some more software to keep the first program running,
than but some more to keep the second running then ....

Thanks,
Larry Barber

On Fri, Apr 6, 2012 at 4:31 PM, Larry Barber <lebarber at gmail.com> wrote:

Resending to the list, Gmail seems to be hiding the "reply to all". 

Thanks,
Larry Barber

 

On Fri, Apr 6, 2012 at 4:28 PM, Larry Barber <lebarber at gmail.com> wrote:

The kind of things that you can automate should be handled routinely,
not be triggered by an alert from your monitoring tool. If you have logs
growing to fast that they are filling up you file system you should find
out what is filling them up and why and then fix that. Automatic log
rotation and compression should be done by a tool like logrotate, not
Xymon or any other monitoring tool. You shouldn't be using a monitoring
tool to trigger routine maintenance, it simply causes unnecessary alerts
that cause problems in other areas. 

Thanks,
Larry Barber

 

On Fri, Apr 6, 2012 at 4:06 PM, KING, KEVIN <KK1051 at att.com> wrote:

Larry,

 

Some auto correcting is not bad.  Back in the Big brother days I had a
datacenter and team of folks. We managed to the "yellow" alerts. I had
folks correct and build scripts to address the things that brought on
the yellow so we never saw the red.  This made it so very little red was
ever seen.

 

Now the things you can automate are the disk full kind of things. If
that happens you can run a script to clean logs compress and that stuff.
This was usually handled by managing the yellow. There would be a script
in place to keep the space to below the yellow trigger. So if you got a
red it was usually a bug temp file or something that would get cleaned
shortly. So say on the red alert you could have it run the cleanup
script rather than waiting for your cron to do the normal cleanup.

 

Now on other issues it really depends on what the alert is about. You
cannot automate everything economically. At some point it is cheaper and
faster to put a human in the loop. I did have a script that would take
the e-mail response from the alert and we could have it parse the
message and do the work. This was back in the day with the RIM pagers.
So you got an alert you replied to the alert with "run clean script on
host" The reply e-mail was parsed in by the same script we were using to
acknowledge the alert. It would parse and run a clean script. This let
my admins be able to work issues while away from a PC or network
connection.

 

I do hear and agree with your concerns. A blanket statement from
managers that do not have a full understanding of all the elements is a
ruff thing to swallow. But there heart is in the right spot J 

 

I guess in a rather long rambling way I am saying that you learn and
tune your systems. Address re-occurring issues so they do not. Then
watch for the next thing to be addressed.

 

 

-Kevin

 

 

From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Larry Barber
Sent: Friday, April 06, 2012 1:43 PM
To: xymon at xymon.com
Subject: [Xymon] autofixing

 

My management has gotten the idea that we should be automating the
repair processes on our servers. They want things set up so that when a
fault is detected a script is run that attempts to repair it. I've tried
to convince them that this is a profoundly wrong-headed idea, but I'm
not having much luck. Do any of you know of any articles or resources
that might help convince them? 

Thanks,
Larry Barber

 

 

 


 
Bruce White
Senior Enterprise Systems Engineer | Phone: 1-630-671-5169 | Fax: 630-893-1648 | bewhite at fellowes.com | http://www.fellowes.com/
 
 
 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20120411/b9286827/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fellbrand1.jpg
Type: image/jpeg
Size: 3463 bytes
Desc: fellbrand1.jpg
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20120411/b9286827/attachment.jpg>


More information about the Xymon mailing list