Outliers In Data Mining

784 Words4 Pages

Abstract- Outlier detection is an active area for research in data set mining community. Finding outliers from a collection of patterns is a very well-known problem in data mining. Outlier Detection as a branch of data mining has many applications in data stream analysis and requires more attention. An outlier is a pattern which is dissimilar with respect to the rest of the patterns in the data set. Detecting outliers and analyzing large data sets can lead to discovery of unexpected knowledge in area such as fraud detection, telecommunication, web logs, and web document, etc. This paper focuses to clarify the problem with detecting outlier over data stream and specific techniques used for detecting outlier over streaming data in data mining. …show more content…

INTRODUCTION Data mining extracts hidden and useful information from the data. Valid, previously unknown, useful and high quality knowledge is discovered by data mining. Outlier detection is an important task in data mining. Outlier detection has many important applications and deserves more attention from data mining community. Outlier detection is an important branch in data pre-processing and data mining, as this stage is required in elaboration and mining of data coming from many application fields such as industrial processes, transportation, ecology, public safety, climatology. Outliers are data which can be considered anomalous due to several causes. Outlier detection techniques are used, for instance, to minimize the influence of outliers in the final model to develop, or as a preliminary pre-processing stage before the information conveyed by a signal is elaborated. On the other hand in …show more content…

Statistical Outlier Detection Statistical outlier detection uses certain kind of statistical distribution and computes the parameters by assuming all data points have been generated by statistical distribution. In this approach outliers are points that have a low probability to be generated by the overall distribution Statistical outlier detection technique is also known as parametric approach. This technique is formulated by using the distribution of data point available for processing. Detection model is formulated to fit the data with reference to distribution of data. A Gaussian mixture model was proposed by Yamanishi et. al.[1]. Where each data point is given a formulated score and data point which have a high score declared as outlier. Detecting outlier based on the general pattern within data points was proposed by [2] where it combines a Gaussian mixture model and supervised method Depth based outlier detection [3] is one of the variant of statistical outlier detection. Depth based outlier detection search outliers at the border of the data space bur independent of statistical distributions. These techniques are generally suited quantitative real-valued data sets or quantitative ordinal data distributions. In this approach each data object of dataset represented by an n-D space having a assigned depth. These data points are organized into convex hull layers according to assigned depth and outlier is formulated on the basis of shallow depth values. Outliers are