Arun Karnik
2 min readApr 5, 2021

--

Steps Involved in Data Pre-processing:

1. Data Cleaning:

The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It involves handling of missing data, noisy data, etc.

(a) Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are:

  1. Ignore the tuples:
    This approach is suitable only when the data set we have is quite large and multiple values are missing within a tuple.
  2. Fill in the missing values:
    There are various ways to do this task. You can choose to fill the missing values manually, by attribute mean or the most probable value.

(b) Noisy Data:
Noisy data is meaningless data that can’t be interpreted by machines. It can be generated due to faulty data collection, data entry errors, etc.

2. Data Transformation:

This step is taken in order to transform the data into appropriate forms suitable for the mining process.

This involves the following ways:

  1. Normalization:
    It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
  2. Attribute Selection:
    In this strategy, new attributes are constructed from the given set of attributes to help the mining process.
  3. Discretization:
    This is done to replace the raw values of numeric attributes with interval levels or conceptual levels.
  4. Concept Hierarchy Generation:
    Here attributes are converted from level to higher level in the hierarchy. For Example-The attribute “city” can be converted to “country”.

3. Data Reduction:

Since data mining is a technique that is used to handle huge amounts of data. While working with a huge volume of data, analysis became harder in such cases. In order to get rid of this, we use a data reduction technique. It aims to increase storage efficiency and reduce data storage and analysis costs.

--

--

Arun Karnik

Arun Karnik specializes in Business applications & AI, CRM, Chatbots, Automation, Data Analytics and participated in WIPO training & Digital Masters Conference.