[Xymon] alerts.cfg GROUP not matching

James, Tim A. Tim.James at navient.com
Sat Aug 15 05:42:26 CEST 2020


I had all of this working and "something" changed and now the majority of my groups defined in my analysis.cfg file no longer alert.  I'm hoping it wasn't when I upgraded from 4.3.28 to 4.3.30 but I'm not ruling anything out.
I have sanitized the server name to foo.bar.com

Analysis.cfg snippet:

HOST=foo*
        DISK /opt/sas 90 95 GROUP=sas_support
        DISK /opt/sas/9.4 90 95 GROUP=sas_support
        DISK /opt/sas/9.4/depot2 90 95 GROUP=sas_support
        DISK /opt/sas/saslanding 90 95 GROUP=sas_support
        DISK /opt/sas/saslanding/in 90 95 GROUP=sas_support
        DISK /opt/sas/saslanding/out 90 95 GROUP=sas_support
        DISK /opt/sas/sasmain 90 99 GROUP=sas_support
        DISK /opt/sas/sassecure 90 99 GROUP=sas_support
        DISK /opt/sas/sassecure/modelingcrm 95 98 GROUP=sas_support
        DISK /opt/sas/sassecure/servicing 95 99 GROUP=sas_support
        DISK /opt/sas/sassecure/servicing/SCRA/MOENDs 90 95 GROUP=sas_support
        DISK /opt/sas/saswork 50 70 GROUP=sas_support

Alerts.cfg snippet:

GROUP=sas_support SERVICE=disk COLOR=red # SAS Application support team
SCRIPT /usr/local/xymon-server/server/ext/Create_SN_Ticket_From_Xymon-YP2-RP1.sh sas_support FORMAT=SMS DURATION>30 REPEAT=24h stop

GROUP=sas_support SERVICE=disk COLOR=yellow # SAS Application support team
MAIL helpdesk at foo.com FORMAT=SMS DURATION<20 REPEAT=24h stop

Obligatory test from the terminal:

[/usr/local/xymon-server/server/etc]
--> ../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=sas_support

00103435 2020-08-14 23:11:06 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 165
00103435 2020-08-14 23:11:06 Failed 'GROUP=sas_support SERVICE=disk COLOR=red' (group not in include list)
00103435 2020-08-14 23:11:06 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 168
00103435 2020-08-14 23:11:06 Failed 'GROUP=sas_support SERVICE=disk COLOR=yellow' (group not in include list)

--> ../bin/xymoncmd xymond_alert --test foo.bar.com disk --group=sas_support
00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 165
00104898 2020-08-14 23:26:59 Failed 'GROUP=sas_support SERVICE=disk COLOR=red' (group not in include list)
00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 168
00104898 2020-08-14 23:26:59 Failed 'GROUP=sas_support SERVICE=disk COLOR=yellow' (group not in include list)

However the "red" second test, does match further along in the alerts file, just not with a GROUP definition, and the failure there is expected as I didn't specify the duration.
00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 304
00104898 2020-08-14 23:26:59 *** Match with 'HOST=%^.* SERVICE=disk COLOR=red' ***
00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 305
00104898 2020-08-14 23:26:59 Failed 'SCRIPT /usr/local/xymon-server/server/ext/Create_SN_Ticket_From_Xymon-YP2-RP1.sh UNIX FORMAT=SMS DURATION>5 REPEAT=25h' (min. duration 0<301)

Now get this.  Here are two more examples from the alerts.cfg file:
GROUP=satellite SERVICE=disk #test comment
MAIL coworker at foo.com FORMAT=SCRIPT stop

GROUP=unix # Linux Team support (default contact)
MAIL unix-alert at lists.foo.com FORMAT=SMS DURATION<20 stop

And the respective tests from the terminal:
../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=satellite
00104439 2020-08-14 23:21:49 Matching host:service:dgroup:page foo.bar.com:disk:NONE:PROD/PSAS' against rule line 147
00104439 2020-08-14 23:21:49 Failed 'GROUP=satellite SERVICE=disk' (group not in include list)

../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=unix
00104535 2020-08-14 23:23:08 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 150
00104535 2020-08-14 23:23:08 *** Match with 'GROUP=unix' ***
00104535 2020-08-14 23:23:08 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 151
00104535 2020-08-14 23:23:08 *** Match with 'MAIL unix-alert at lists.foo.com FORMAT=SMS DURATION<20 stop' ***
00104535 2020-08-14 23:23:08 Mail alert with command 'mail unix-alert at lists.foo.com

I'm stumped.  Anyone out there have any idea what might be incorrect?


Tim James
Senior System Administrator
Navient


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xymon.com/pipermail/xymon/attachments/20200815/eeb33b19/attachment.htm>


More information about the Xymon mailing list