Pandas. Data processing¶
Pandas is an essential data analysis library within Python ecosystem. For more details read Pandas Documentation.
Contents
Data structures¶
Pandas operates with three basic datastructures: Series, DataFrame, and Panel. There are extensions to this list, but for the purposes of this material even the first two are more than enough.
We start by importing NumPy and Pandas using their conventional short names:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: randn = np.random.rand # To shorten notation in the code that follows
Series¶
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
>>> s = Series(data, index=index)
The first mandatory argument can be
- array-like
- dictionary
- scalar
Array-like¶
If data
is an array-like, index
must be the same length as data
. If no index is passed, one will be created having values [0, ..., len(data) - 1]
.
In [4]: s = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
In [5]: s
Out[5]:
a 0.803862
b 0.474233
c 0.224809
d 0.580670
e 0.747660
dtype: float64
In [6]: s.index