IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques

Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques
View Sample PDF
Author(s): T.Maruthi Padmaja (University of Hyderabad (UoH), India), Raju S. Bapi (University of Hyderabad (UoH), India)and P. Radha Krishna (SET Labs, Infosys Technologies Ltd, India)
Copyright: 2012
Pages: 11
Source title: Pattern Discovery Using Sequence Data Mining: Applications and Studies
Source Author(s)/Editor(s): Pradeep Kumar (Indian Institute of Management, India), P. Radha Krishna (Infosys Technologies Limited, India)and S. Bapi Raju (University of Hyderabad, India)
DOI: 10.4018/978-1-61350-056-9.ch005

Purchase

View Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques on the publisher's website for pricing and purchasing information.

Abstract

Predicting minority class sequence patterns from the noisy and unbalanced sequential datasets is a challenging task. To solve this problem, we proposed a new approach called extreme outlier elimination and hybrid sampling technique. We use k Reverse Nearest Neighbors (kRNNs) concept as a data cleaning method for eliminating extreme outliers in minority regions. Hybrid sampling technique, a combination of SMOTE to oversample the minority class sequences and random undersampling to undersample the majority class sequences is used for improving minority class prediction. This method was evaluated in terms of minority class precision, recall and f-measure on syntactically simulated, highly overlapped sequential dataset named Hill-Valley. We conducted the experiments with k-Nearest Neighbour classifier and compared the performance of our approach against simple hybrid sampling technique. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions for both fraud and non-fraud classes.

Related Content

. © 2023. 34 pages.
. © 2023. 15 pages.
. © 2023. 15 pages.
. © 2023. 18 pages.
. © 2023. 24 pages.
. © 2023. 32 pages.
. © 2023. 21 pages.
Body Bottom