<div>

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">I think what this really boils down to is some form of event correlation<br>mechanism, on top of which you then apply some heuristics (that's a

<br>fancy word for "guessing") to decide what is the core issue. </blockquote>

<div> </div>

<div> </div>

<div>| If you know of any products that are really good at handling this, I'd<br>| be interested to hear about them.<br> </div>

<div>Heuristics is poppycock in the datacenter.  Humans are so ridiculously good at correlating events the effort is completely useless to try and train a computer to guess.  Now, from an intellectual or research point of view that may not be the case, but I am pragmatic in the datacenter:  Useful, not interesting.

</div>

<div> </div>

<div>My thought to "solve" this problem is the idea of "scenario fingerprinting."  As I mentioned, trying to teach a computer to learn is futile, but instructing a computer to look for *known* conditions works perfectly.  Criminals and problems have a tendency to repeat themselves.  

</div>

<div> </div>

<div>So, rather than deal with "event correlation", I think a better approach would be an engine that could do state analysis with many rules for a single scenario. Perhaps it's semantics, but "event correlation" to me implies events over time, and I don't think you need the time parameter, only the view of the environment at an instant, the fingerprint.   If the scenario is recognized, then "react" by disabling and alerting appropriately.   

</div>

<div> </div>

<div>Example, say you lose a router in Europe and all the pings die across the pond (I am in North America).  Generate a "scenario alert" that described the scenario and disable all the routers/hosts over there.  Odds are if that router went down once,  it will go down again.

</div>

<div> </div>

<div>You leverage the ability of a human to correlate with the computers ability to "keep on the look-out" for "known offenders."  I think this methodlogy could also be applied to the RRD system stats.

</div>

<div> </div>

<div>Let the machines do what they are good at, following instructions, and let the humans do what they are good at, thinking.</div>

<div> </div>

<div>Scott</div></div>