[Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?

john.r.rothlisberger at accenture.com john.r.rothlisberger at accenture.com
Tue Nov 26 17:14:26 CET 2013


Interesting... I just finished working on a perl script that notifies us when someone acks an alert.  This may not be exactly what you are looking for but you can use it or change it to your liking.

The script is attached.  No, it's not perfect and there are probably lots of things that could be done differently... but it works.
You will need to change the following lines to suite your needs:
      From => '<sender>@<company>.com',
      To => '<receipient>@<company>.com',

I also just

I have named it ack_watch.pl and run it via cron every 5 minutes.
*/5 * * * * /home/xymon/bin/ack_watch.pl > /dev/null 2>&1

It looks at the epoch time and duration in the acknowledge.log file and checks to see if the ack end time is greater than the current time.  If it is, it will generate an email that looks like this:


Report Time: 11/26/2013 08:30

Xymon Server:

The following alert(s) were recently acknowledged.



Server/Test: attbbydb1.msgs

   Ack at: 11/26/2013 08:23

   Ack ends: 11/29/2013 23:23

   Ack duration: 3 days 15 hours

   Alert color: yellow

   Ack reason: ACK TEST ONLY 87 hours

It will also create a temporary file using the acktime + alert id which is just used to not send duplicate emails for the same ack.  (create a directory called: ~server/tmp/ACK_WATCH)

To keep the script from parsing through a long history of acks I have set it up so that after 10 acks are in acknowledge.log the file is moved to an archive directory.

I don't know if this is the direction you were looking to go but it seemed appropriate.

Thanks,
John
Upcoming PTO:
None

_____________________________________________________________________
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
312.693.3136 office
_____________________________________________________________________

From: Betsy Schwartz [mailto:betsy.schwartz at gmail.com]
Sent: Tuesday, November 26, 2013 9:42 AM
To: Rothlisberger, John R.
Cc: xymon at xymon.com
Subject: Re: [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?

Belatedly - what I'm thinking about is how to get metrics reports, over the organization, for example "average time to ack yellows" or "time from ack to resolution"

I see that the data about color changes is in $XYMONHOME/data/hist stored by host-test , and the data about acks is in $XYMONHOME/log/acknowledge.log
so I'm thinking we can put that together with splunk.

Alternately, the board knows about color and acktime, so it's possible to get realtime stats as below ("this alert has been yellow for N minutes") but there's nothing to put that together over time, which is why I'm thinking splunk
It would be great if xymon's built-in reports knew about "ACK". we've very ack-driven around here


On Wed, Nov 13, 2013 at 9:50 AM, <john.r.rothlisberger at accenture.com<mailto:john.r.rothlisberger at accenture.com>> wrote:
I do this in an alert script:

ACTIVE=`/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print"@"$5}'|xargs date -d`
NOW=`date '+%s'`
ALERTACTIVE=`/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $5}'`
ACTIVECOLOR=`/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $3}'`
ALERTDIFF=`expr $NOW - $ALERTACTIVE`
ALERTTIME=`echo - | awk -v S=$ALERTDIFF '{printf "%d hours %d minutes",S/(60*60),S%(60*60)/60}'`

Which, eventually shows up like this in our email alert:
Alert Active Since: Tue Nov 12 11:28:52 CST 2013  (Duration of Alert 4 hours 1 minutes)

You could use the same logic to get what you want.

Thanks,
John
Upcoming PTO:
None

_____________________________________________________________________
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
312.693.3136<tel:312.693.3136> office
_____________________________________________________________________

From: Xymon [mailto:xymon-bounces at xymon.com<mailto:xymon-bounces at xymon.com>] On Behalf Of Betsy Schwartz
Sent: Wednesday, November 13, 2013 8:20 AM
To: xymon at xymon.com<mailto:xymon at xymon.com>
Subject: [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?

My grand-boss is looking to set some standards for how long we let reds and yellows go un-ACKed
and un-resolved. There's a built in report but it seems to summarize total time red /yellow and what we're really interested in is how long it's taking us to respond.

Has anyone done anything with this?
I'm wondering if feeding the acklogs into splunk would let us work something up. And/or thinking about just trying to scrape this off the board.
Thoughts and code snippets welcome


________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited.

Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.

______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20131126/9b7fcf1d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ack_watch.txt
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20131126/9b7fcf1d/attachment.txt>


More information about the Xymon mailing list