Pandas is an essential data analysis library within Python ecosystem. For more details read Pandas Documentation.
- Pandas. Data processing
- Data structures
- Basic functionality
- Function application
- Reindexing and altering labels
- Sorting by index and value
- Indexing and selecting data
Pandas operates with three basic datastructures: Series, DataFrame, and Panel. There are extensions to this list, but for the purposes of this material even the first two are more than enough.
We start by importing NumPy and Pandas using their conventional short names:
In : import numpy as np In : import pandas as pd In : randn = np.random.rand # To shorten notation in the code that follows
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
>>> s = Series(data, index=index)
The first mandatory argument can be
data is an array-like,
index must be the same length as
data. If no index is passed, one will be created having values
[0, ..., len(data) - 1].
In : s = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e']) In : s Out: a 0.803862 b 0.474233 c 0.224809 d 0.580670 e 0.747660 dtype: float64 In : s.index