IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Data

Data
View Sample PDF
Copyright: 2023
Pages: 15
Source title: Principles and Theories of Data Mining With RapidMiner
Source Author(s)/Editor(s): Sarawut Ramjan (Thammasat University, Thailand)and Jirapon Sunkpho (Thammasat University, Thailand)
DOI: 10.4018/978-1-6684-4730-7.ch002

Purchase

View Data on the publisher's website for pricing and purchasing information.

Abstract

The initial step for a data scientist when addressing a business question is to identify the data type, as not all types can be employed in data mining analyses. Accordingly, the data scientist must select a suitable data type that corresponds to the data mining technique and classify the data into categorical and continuous types, regardless of the source of the data. Quality control is a significant factor for the data scientist, particularly if data collection was poorly administered or designed, leading to issues like missing values. Once the data scientist has acquired a relevant dataset, they should inspect the outliers associated with each feature to make sure the data is suitable for analysis. Observing outliers through data visualizations, such as scatter plots, is a common practice among data scientists, highlighting the crucial role of data type determination.

Related Content

. © 2023. 34 pages.
. © 2023. 15 pages.
. © 2023. 15 pages.
. © 2023. 18 pages.
. © 2023. 24 pages.
. © 2023. 32 pages.
. © 2023. 21 pages.
Body Bottom