I've had a lot of success applying an Anomaly Detection algorithm to a problem at work. Without revealing anything sensitive, essentially the idea is be able to flag actions taken by the user of the system as "anomalies" and take preventative action.
There are many ways to do anomaly detection. My data has some characteristics that made the Multivariate Gaussian approach a good fit, namely:
* most features follow a normal distribution
* anomalies have features with values on extreme ends of the scale (i.e a number of standard deviations away from the mean)
* feature values are interrelated (i.e. a large value in feature A tends to correspond to a large value in feature B)
I didn't apply Principle Component Analysis. I wanted to see how good the results were before complicating the solution. As it turns out, it does pretty well. I had 4271 examples - of which only 3 were labelled as known anomalies. Only 0.5% (24) examples were predicted to be anomalies. All three known anomalies were found, and a further four were uncovered after hand labeling the remaining 21. Very nice indeed!
This results in a fantastic seeming F1 score (0.998). The fallacy in the calculation is that I haven't hand labeled a representative set of the unlabeled data - i.e. I've made the dangerous assumption that all examples predicted to be normal are indeed normal, and that there were no anomalies in fact predicted to be normal.
Very rewarding, none the less. All of this enabled by Octave of course. Ahh: hist(X(:,2))
Jan 9, 2012
Machine Learning: Applying Anomaly Detection to Real World Problems
Posted
5:37 PM
Labels: ai, anomaly detection
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment