Unsupervised Learning Techniques for Anomaly Detection

Created 2 years ago

133 Views

0 Comments

@Dhivvyan

Anomaly Detection

Anomaly is a synonym for the word ‘outlier’. Anomaly detection (or outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Anomalous activities can be linked to some kind of problems or rare events such as bank fraud, medical problems, structural defects, malfunctioning equipment etc.

Unsupervised Anomaly Detection

In contrast to the unsupervised learning techniques, i.e., K-means, gaussian mixture techniques, K-medians, etc., Anomaly Detection for unsupervised learning also deals with unlabelled data. So much like the unsupervised learning techniques, anomaly detection for such knowledge also works by figuring out the pattern the unlabelled points are following.

The anomalies are detected by standing out from the trend set by the other data points. For example, a vast selection of unsupervised learning algorithms works on the concept of clustering techniques.

Let us look at some of the techniques that can be used for Anomaly Detection in Unsupervised Learning:

Isolation Forest: Based on the concepts derived from Random Forest Classifier, an Isolation Forest processes randomly subsampled data in a tree structure based on random attributes. Samples from deeper in the tree are less likely to be abnormalities because they require more cuts. Similarly, illustrations on shorter branches reveal abnormalities since the tree could distinguish them more easily.

Outlier Detection Factor: The local density deviation of a particular data point relative to its neighbors. It identifies as outlier samples with significantly lower density than their neighbors, hence along the way identifying the outliers.

Mahalanobis Distance: The Mahalanobis distance is a useful multivariate distance metric that evaluates the separation between a point and a distribution. This technique is the ideal go-to for dealing with one-class classification and highly imbalanced datasets.

Autoencoders: Autoencoders leverage a neural network's property in a unique way to achieve specific techniques of training networks to learn expected behavior. When an outlier data point arrives, the auto-encoder cannot codify it. As a result, the reconstruction will not be accurate, giving us anomalies in the dataset.

Supervised Anomaly Detection

Since supervised learning relies on labeled data, so do the techniques used to detect anomalies in such models. However, detecting anomalies in such labeled data can be much easier than doing so in unsupervised learning datasets; these techniques hold great potential to be automated and made more efficient.

As arduous as it may be to collect labeled data, in most cases and applications where such techniques are used, the labels are pretty intricate with various other parameters and variables that come with the labels. Therefore, studying these parameters also grants the methods in these categories much more efficient in dealing with unseen data.

Let us look at some of the techniques that can be used to detect anomalies in Supervised Machine Learning Techniques:

Support Vector Machines: SVMs employ multidimensional hyperplanes to segregate observations. In addition, SVM solves multi-class classification issues. SVM is widely used when data belongs to a single class. In this example, the algorithm learns what is "normal" to determine if it belongs in the group.

K Nearest Neighbors: The underlying premise of the nearest-neighbor family is that comparable observations are located close to one another and that outliers are typically solitary observations situated further away from the cluster of similar observations, thus giving us anomalies.

Comments

Please login to comment.