The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Cluster Analysis for Outlier Detection
|
Author(s): Frank Klawonn (University of Applied Sciences Braunschweig/Wolfenbuettel, Germany)and Frank Rehm (German Aerospace Center, Germany)
Copyright: 2009
Pages: 5
Source title:
Encyclopedia of Data Warehousing and Mining, Second Edition
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-60566-010-3.ch035
Purchase
|
Abstract
For many applications in knowledge discovery in databases finding outliers, rare events, is of importance. Outliers are observations, which deviate significantly from the rest of the data, so that it seems they are generated by another process (Hawkins, 1980). Such outlier objects often contain information about an untypical behavior of the system. However, outliers bias the results of many data mining methods like the mean value, the standard deviation or the positions of the prototypes of k-means clustering (Estivill-Castro, 2004; Keller, 2000). Therefore, before further analysis or processing of data is carried out with more sophisticated data mining techniques, identifying outliers is a crucial step. Usually, data objects are considered as outliers, when they occur in a region of extremely low data density. Many clustering techniques like possibilistic clustering (PCM) (Krishnapuram & Keller, 1993; Krishnapuram & Keller, 1996) or noise clustering (NC) (Dave, 1991; Dave & Krishnapuram, 1997) that deal with noisy data and can identify outliers, need good initializations or suffer from lack of adaptability to different cluster sizes (Rehm, Klawonn & Kruse, 2007). Distance-based approaches (Knorr, 1998; Knorr, Ng & Tucakov, 2000) have a global view on the data set. These algorithms can hardly treat data sets containing regions with different data density (Breuning, Kriegel, Ng & Sander, 2000). In this work we present an approach that combines a fuzzy clustering algorithm (Höppner, Klawonn, Kruse & Runkler, 1999) (or any other prototype-based clustering algorithm) with statistical distribution-based outlier detection.
Related Content
Girija Ramdas, Irfan Naufal Umar, Nurullizam Jamiat, Nurul Azni Mhd Alkasirah.
© 2024.
18 pages.
|
Natalia Riapina.
© 2024.
29 pages.
|
Xinyu Chen, Wan Ahmad Jaafar Wan Yahaya.
© 2024.
21 pages.
|
Fatema Ahmed Wali, Zahra Tammam.
© 2024.
24 pages.
|
Su Jiayuan, Jingru Zhang.
© 2024.
26 pages.
|
Pua Shiau Chen.
© 2024.
21 pages.
|
Minh Tung Tran, Thu Trinh Thi, Lan Duong Hoai.
© 2024.
23 pages.
|
|
|