A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Hide Journal Menu

Current Issue

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

**Muhammad Shaheen¹* and Abdullah²**

Author Affiliations

¹Faculty of Engineering and Information Technology, Foundation University Islamabad, Pakistan; ²National University of Computer and Emerging Sciences, Peshawar, Pakistan.

*Correspondence: Muhammad Shaheen, Faculty of Engineering and Information Technology, Foundation University Islamabad, Pakistan; Email: dr.shaheen@fui.edu.pk

ABSTRACT

Classification in data mining is one of the major functionalities that is performed either by predicting the value of unknown class labels on the basis of previously labeled data or to make groups of the dataset on the basis of some implicit similarity measure. Clustering works on unsupervised datasets and converts datasets to groups on the basis of some measures like Euclidean distance in K Means Clustering. The performance of K Means can significantly be affected by outliers. Outliers are not dealt in the K Means algorithm. This paper proposes a change in the K Means algorithm to accommodate the method for outlier detection on the basis of the threshold value. The threshold value of the outlier named as clus_span is computed by taking distance of each point from each other point and dividing it by the total number of points. All the points of a dataset that do not qualify the value of the minimum threshold are considered as outliers. New K Means with this add-in is tested on benchmark dataset for identification of outliers and compared with the existing K means algorithm in terms of accuracy. An improvement in performance is evident.

To share on other social networks, click on any share button. What are these?

This Issue

December 2021

Vol. 40, Iss. 2, pp. 91-134

Journal of Engineering and Applied Sciences

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Special Issues

Membership/Association

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

**Muhammad Shaheen¹* and Abdullah²**

ABSTRACT

Journal of Engineering and Applied Sciences

Featuring

A Public Twitter Dataset from Pakistan During Covid-19

Silver Tin Copper (Ag-Sn-Cu) Powder Coating on Aluminum Substrate and Teeth Using High Pressure Cold Spray

An Analysis of Delay Factors in Government Construction Projects in Southern Punjab, Pakistan: A Case Study of Metro Bus Service Project, Multan District

Scope of Biodiesel and Role of Homogeneous Catalysts: A Review

Journal of Engineering and Applied Sciences

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Special Issues

Membership/Association

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Muhammad Shaheen1* and Abdullah2

ABSTRACT

Journal of Engineering and Applied Sciences

December

Featuring

A Public Twitter Dataset from Pakistan During Covid-19

Silver Tin Copper (Ag-Sn-Cu) Powder Coating on Aluminum Substrate and Teeth Using High Pressure Cold Spray

An Analysis of Delay Factors in Government Construction Projects in Southern Punjab, Pakistan: A Case Study of Metro Bus Service Project, Multan District

Scope of Biodiesel and Role of Homogeneous Catalysts: A Review

Subscribe Today

**Muhammad Shaheen¹* and Abdullah²**