IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery

Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery
Author(s)/Editor(s): Hsiao-Fan Wang (National Tsing Hua University, ROC)
Copyright: ©2009
DOI: 10.4018/978-1-59904-982-3
ISBN13: 9781599049823
ISBN10: 1599049821
EISBN13: 9781599049830

Purchase

View Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery on the publisher's website for pricing and purchasing information.


Description

Pattern Recognition has a long history of applications to data analysis in business, military and social economic activities. While the aim of pattern recognition is to discover the pattern of a data set, the size of the data set is closely related to the methodology one adopts for analysis.

Intelligent Data Analysis: Developing New Methodologies Through Pattern Discovery and Recovery tackles those data sets and covers a variety of issues in relation to intelligent data analysis so that patterns from frequent or rare events in spatial or temporal spaces can be revealed. This book brings together current research, results, problems, and applications from both theoretical and practical approaches.



Preface

Intelligent Data Analysis provides learning tools of finding data patterns based on artificial intelligence. Pattern Recognition has a long history of applications to data analysis in business, military and social economic activities. While the aim of pattern recognition is to discover the pattern of a data set, the size of the data set is closely related to the methodology one adopts for analysis. The classic approach is using certain statistical techniques to deal with data sets of more than 30 samples and by dimension reduction to reveal the pattern. With the rapid increase of internet development and usage, the amount of data has been enormous. Data Warehouse has been used to describe such quantity of data and the corresponding methodologies for analysis are under the title of Data Mining.

In contrast to the huge amount of data sets, there is another type of data set which is so small (less than 30), but is still significant in terms of socio-economic cost. Consider severe earthquakes, random terrorist attacks, and nuclear plant explosions; the occurrences of such events are relatively few that the conventional statistic assumptions cannot be verified and thus the methods fail to apply. The ability to predict such kinds of events remains a challenge for the researchers. This leads to the necessity of recovering a pattern by constructing data.

Apart from these two extreme cases related to the amount of data which affect the method of analysis to be adopted, the types of the data are another major factor needed to be considered. Since in reality, the collected data are never complete and thus certain degree of uncertainty is always embedded. Classical approach to coping with uncertainty is based on Probability Theory in random nature. Along with different methodologies and observations being investigated, data types other than randomness are studied and explored. Among these, fuzzy data, grey data and coarse data with their hybrid forms are studied most extensively. The results pave a way to find data patterns from binary groupings to degree of belongings in more accurate and precise manner.

For all of these data types in quantity and quality, apart from Probability Inference being adopted for analysis, a group of heuristic approaches namely, Soft Computing (or Computational Intelligence) has been developed and employed for different areas of applications. Fuzzy logic, Evolutionary Computing, Neural Net Analysis etc have shown their capability in coping such kinds of data sets. It is an art and science for intelligent data analysis

Since pattern recognition has been a learning process ever since living beings began, classical approaches to classifying data into binary groups have been enormous in the literature. Due to the increasing impact of extreme events on the socio-economic costs, 38 authors from 10 different countries contributed their findings to 18 chapters in this book, each addresses different issues of intelligent pattern discovery and recovery from both theoretical and practical viewpoints. The readers will benefit from the integration of these two extreme cases in a comparative manner.

The book is categorized into four sections. After an introduction of the up-to-date development and research on methodologies and data properties in Section I, issues and resolution of pattern discovery from huge data set are discussed and applied respectively in Sections II and III. Finally, in Section IV, methodology developments and the possible applications of pattern recovery from small data sets are presented. It can be noted from the unbalanced numbers of chapters related to huge data sets and small data sets, methods related to pattern recovery from small data set require the devotion of more researchers. The outline of each chapter is given in the following:

In Section I. of Introduction, 5 chapters are included as below:

Chapter I. provides a software platform for automatic data analysis that uses a fuzzy knowledge base for automatically selecting and executing data analysis methods. The authors show that a system based on a fuzzy pattern base that stores heuristic expert knowledge from data analysis can successfully lead to automatic intelligent data analysis. Therefore, the system is able to support business users in running data analysis projects more efficiently

Chapter II provides a rigorous theory of random fuzzy sets in its most general form. Imprecise data which are both random and fuzzy are focused. Critical issues in relation to such kind of data with hybrid natures are discussed and a framework based on Probability Theory is proposed for analysis.

Chapter III highlights meaningful pattern discovery techniques for gene expression data. The properties of gene expression data themselves are examined and the possible patterns are suggested. The classes of clustering techniques in the context of their application to gene expression data are investigated and a comparative analysis of standard and non-standard methods is given with the suggestion of areas for possible future development.

Chapter IV describes the use of fast, data-mining algorithms such as TreeNet and Random Forests (Salford Systems Ltd) to identify ecologically meaningful patterns and relationships in subsets of data that carry various degrees of outliers and uncertainty. An example of using satellite data from a wintering Golden Eagle shows that the proposed approach has provided a promising tool for wildlife ecology and conservation management.

Chapter V applies Atanassov's theory of intuitionistic fuzzy sets to analyze imbalanced and overlapping classes by defining both the membership and non-membership degrees for each member. Since imbalanced and overlapping classes are a real challenge for the standard classifiers. The method proposed in this chapter is crucial not only in theory but also on many different types of real tasks.

Section II of methodologies regarding pattern discovery from huge data set contains 5 chapters, each is introduced as below:

Chapter VI introduces fuzzy neural network models as means for knowledge discovery from databases. It not only describes architectures and learning algorithms for fuzzy neural networks, but also proposes an algorithm for extracting and optimizing classification rules from a trained fuzzy neural network. An example of multispectral satellite images is given and it shows that the presented models and the methodology for generating classification rules from data samples provide a valuable tool for knowledge discovery.

Chapter VII discusses the paradigm of genetic algorithms and their incorporation into machine learning. Special attention is given to 3 issues: (a) the ways of initialization of a population for a genetic algorithm, (b) representation of chromosomes in genetic algorithms, and (c) discretizati¬on and fuzzificati¬on of numerical attributes for genetic algorithms. Furthermore, this chapter surveys new trends of dealing with the variable-length chromosomes and other issues related to the genetic learners.

Chapter VIII introduces the evolutionary computing as a whole and discusses specifically in details on two sub-areas of nature-inspired computing in Evolutionary Computing, namely, Evolutionary Algorithms and Swarm Intelligence. The theoretical background of these sub-areas are illustrated with demonstration of some real-world applications. The chapter also points out future trends and directions in these areas.

Chapter IX proposes two composite approaches which combine conventional data fitting with peak-matching to cope with ‘noise’ data in solving an inverse light scattering problem for single, spherical, homogeneous particles using least squares global optimization and show that they lead to a more robust identification procedure.

Chapter X introduces an approach called Markov chain Monte Carlo for the exact simulation of sample values from complex distributions. The proposed algorithm facilitates the implementation of a Markov chain that has a given distribution as its stationary distribution. The applications of these algorithms in probabilistic data analysis and inference are given.

Section III of the applications of pattern discovery from huge data set contains five cases from different industrial sectors of manufactory, transportation and services:

Chapter XI provides suitable Knowledge Bases (KBs) for carrying out forward and reverse mappings of Tungsten Inert Gas (TIG) welding process. Both the forward as well as reverse mappings are required for an effective on-line control of a process. Although conventional statistical regression analysis is able to carry out the forward mapping efficiently, it may not be always able to solve the problem of reverse mapping. Fuzzy Logic (FL)-based approaches are adopted to conduct the forward and reverse mappings of the TIG welding process and they have shown to solve the above problem efficiently.

Chapter XII concerns a problem of road travel in the US, namely the discernment of the levels of traffic fatalities across the individual states. Based on the cognitive uncertainties evident in the imprecision inherent with the data values, a fuzzy approach to decision tree is adopted for inference. The results show that the inference from the tree structure takes advantage of the ability of humans to distinguish between patterns and observable characteristics.

Chapter XIII provides a method to resolve the major problem of time discontinuity resulting from the transactional character of events in telecom market. By gradually enriching the data information content from the prior lifetime expectancy through standard static events data up to decay-weighted data sequences, the proposed sequential processing of appropriately preprocessed data streams is shown to be able to have better performance of customer churn prediction.

Chapter XIV applies Dempster-Shafer Theory to object classification and ranking. Based on this theory, a method called CaRBS is proposed and an application to cope with uncertain reasoning on Moody’s Bank Financial Strength Rating (BFSR) process is demonstrated. The value of this chapter is placed on the measures of ignorance such that during a series of classification and ranking analyses, decision on adopting or abandoning the existing evidence can be determined.

Chapter XV illustrates how to describe the individual’s preference structure and utilize its properties to define an individual’s risk level for the confronted risk. Then, a response evaluation model was proposed to develop the appropriate response strategy. These two stages of risk analysis and a risk response contribute to a complete Individual Risk Management process (IRM). A case of A-C court was demonstrated and the results showed that the proposed method is able to provide more useful and pertinent information than the traditional method of decision tree which is based on the Expected Monetary Value (EMV).

Section IV contains 3 chapters of current methodologies developed for analyzing small sample sets with illustration of their applications.

Chapter XVI introduces the use of the bootstrap in a nonlinear, nonparametric regression framework with dependent errors. The AR-Sieve bootstrap and the Moving Block bootstrap which are used to generate bootstrap replicates with a proper dependence structure are used to avoid the inconsistent choice inherent in conventional Bootstrap method. In the framework of neural network models which are often used as an accurate nonparametric estimation and prediction tool, both procedures have shown to have satisfactory results.

Chapter XVII proposes a methodology based on Hilbert-EMD-based support vector machine (SVM) to predict financial crisis events for early-warning purpose. A typical financial indicator currency exchange rate reflecting economic fluctuation conditions is first chosen. Then the Hilbert-EMD algorithm is applied to the economic indicator series. This chapter also applies the proposed method to two real-world cases of South Korea and Thailand who suffered from the 1997-1998 disastrous financial crisis experience. The results show that the proposed Hilbert-EMD-based SVM methodology is capable of predicting the financial crisis events effectively.

Chapter XVIII proposed an alternative approach named Data Construction Analysis (DCA) to overcome the problem derived from insufficient data, in particular, the defects existent in one commonly used approach called Intervalized Kernel method of Density Estimation (IKDE). Comparative studies have shown that the proposed DCA is not only resolve the insufficient data in general; but also improve the prediction accuracy in both degrees and stability of IKDE.

From the content described above, it can be noted that this book will be useful for both researchers and practitioners who are interested in receiving comprehensive views and insights from the variety of issues covered in this book in relation to pattern discovery and recovery. In particular, those who have been working on data analysis will have an overall picture of the existing and potential developments on the issues related to intelligent pattern recognition.

More...
Less...

Reviews and Testimonials

This book will be useful for both researchers and practitioners who are interested in receiving comprehensive views and insights from the variety of issues covered in relation to pattern discovery and recovery.

– Hsiao-Fan Wang, National Tsing Hua University, Taiwan, ROC

In these 28 articles, editor Wang and her contributors emphasize new discoveries that have widened methodological choices even further in the analysis of artificial data.

– Book News Inc. (March 2009)

Intended for the computer-science professional, each essay is rich with field-specific jargon, but the topics covered are also relevant to the concerns of e-commerce administrators, public interest groups, and civil policymakers.

– Library Journal (June 2009)

Author's/Editor's Biography

Hsiao-Fan Wang (Ed.)
Hsiao-Fan Wang is the Tsing Hua Chair Professor and the Vice Dean of the College of Engineering of National Tsing Hua University (Taiwan). She has been teaching at the Department of Industrial Engineering and Engineering Management at the same university, NTHU after she graduated from Cambridge University (UK) in 1981. She used to be the Head of the Department of IEEM, NTHU, President of Chinese Fuzzy Systems Association, Vice President of International Fuzzy Systems Association and Erskine Fellow of Canterbury University, NZ. Also, she has been awarded the Distinguished Research Award from National Science Council of Taiwan (ROC); Distinguished Contracted Research Fellow of NSC and Distinguished Teaching Award of Engineering College, NTHU. She used to be the editor-in-chief of the Journal of Chinese Industrial Engineering Association; also the Journal of Chinese Fuzzy Set and Theories and now is the area editor of several international journals. Her research interests are in multicriteria decision making, fuzzy set theory and operations research.

More...
Less...

Body Bottom