EDA in Python
Numpy:
NumPy is the fundamental package for scientific computing with Python. It is simple yet powerful.
It includes:
- A powerful N-dimensional array object
- Sophisticated (broadcasting) functions
- Tools for integrating C/C++ and Fortran code
- Basic Linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined and this allows NumPy to seamlessly and speedily integrate with a wide variety of projects.
Main advantages of Numpy is to use it in mathematical and logical operations on arrays.
Pandas:
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
In short, Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.
Scikit-Learn:
Scikit-learn is an open source Python library that has powerful tools for data analysis and data mining. It’s available under the BSD license and is built on the following machine learning libraries:
- NumPy, a library for manipulating multi-dimensional arrays and matrices. It also has an extensive compilation of mathematical functions for performing various calculations.
- SciPy, an ecosystem consisting of various libraries for completing technical computing tasks.
- Matplotlib, a library for plotting various charts and graphs.