IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Cluster Analysis for Outlier Detection

Cluster Analysis for Outlier Detection
View Sample PDF
Author(s): Frank Klawonn (University of Applied Sciences Braunschweig/Wolfenbuettel, Germany)and Frank Rehm (German Aerospace Center, Germany)
Copyright: 2009
Pages: 5
Source title: Encyclopedia of Data Warehousing and Mining, Second Edition
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-60566-010-3.ch035

Purchase

View Cluster Analysis for Outlier Detection on the publisher's website for pricing and purchasing information.

Abstract

For many applications in knowledge discovery in databases finding outliers, rare events, is of importance. Outliers are observations, which deviate significantly from the rest of the data, so that it seems they are generated by another process (Hawkins, 1980). Such outlier objects often contain information about an untypical behavior of the system. However, outliers bias the results of many data mining methods like the mean value, the standard deviation or the positions of the prototypes of k-means clustering (Estivill-Castro, 2004; Keller, 2000). Therefore, before further analysis or processing of data is carried out with more sophisticated data mining techniques, identifying outliers is a crucial step. Usually, data objects are considered as outliers, when they occur in a region of extremely low data density. Many clustering techniques like possibilistic clustering (PCM) (Krishnapuram & Keller, 1993; Krishnapuram & Keller, 1996) or noise clustering (NC) (Dave, 1991; Dave & Krishnapuram, 1997) that deal with noisy data and can identify outliers, need good initializations or suffer from lack of adaptability to different cluster sizes (Rehm, Klawonn & Kruse, 2007). Distance-based approaches (Knorr, 1998; Knorr, Ng & Tucakov, 2000) have a global view on the data set. These algorithms can hardly treat data sets containing regions with different data density (Breuning, Kriegel, Ng & Sander, 2000). In this work we present an approach that combines a fuzzy clustering algorithm (Höppner, Klawonn, Kruse & Runkler, 1999) (or any other prototype-based clustering algorithm) with statistical distribution-based outlier detection.

Related Content

Girija Ramdas, Irfan Naufal Umar, Nurullizam Jamiat, Nurul Azni Mhd Alkasirah. © 2024. 18 pages.
Natalia Riapina. © 2024. 29 pages.
Xinyu Chen, Wan Ahmad Jaafar Wan Yahaya. © 2024. 21 pages.
Fatema Ahmed Wali, Zahra Tammam. © 2024. 24 pages.
Su Jiayuan, Jingru Zhang. © 2024. 26 pages.
Pua Shiau Chen. © 2024. 21 pages.
Minh Tung Tran, Thu Trinh Thi, Lan Duong Hoai. © 2024. 23 pages.
Body Bottom