Submit or Track your Manuscript LOG-IN

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Muhammad Shaheen1* and Abdullah2

1Faculty of Engineering and Information Technology, Foundation University Islamabad, Pakistan; 2National University of Computer and Emerging Sciences, Peshawar, Pakistan.

*Correspondence: Muhammad Shaheen, Faculty of Engineering and Information Technology, Foundation University Islamabad, Pakistan; Email: dr.shaheen@fui.edu.pk 

ABSTRACT

Classification in data mining is one of the major functionalities that is performed either by predicting the value of unknown class labels on the basis of previously labeled data or to make groups of the dataset on the basis of some implicit similarity measure. Clustering works on unsupervised datasets and converts datasets to groups on the basis of some measures like Euclidean distance in K Means Clustering. The performance of K Means can significantly be affected by outliers. Outliers are not dealt in the K Means algorithm. This paper proposes a change in the K Means algorithm to accommodate the method for outlier detection on the basis of the threshold value. The threshold value of the outlier named as clus_span is computed by taking distance of each point from each other point and dividing it by the total number of points. All the points of a dataset that do not qualify the value of the minimum threshold are considered as outliers. New K Means with this add-in is tested on benchmark dataset for identification of outliers and compared with the existing K means algorithm in terms of accuracy. An improvement in performance is evident.

To share on other social networks, click on any share button. What are these?

Journal of Engineering and Applied Sciences

December

Vol. 41, Iss. 1, pp. 01-63

Featuring

Click here for more

Subscribe Today

Receive free updates on new articles, opportunities and benefits


Subscribe Unsubscribe