python
import numpy as np
import pandas as pd
1. 数据类型
1.1 Series
series相当于一个一维数组,通过pd.Series(data, index=index)
来创建,可以通过index来自定义索引方式。Series有三种创建方式:
1.1.1 From ndarray
If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].
python
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
# a 0.469112
# b -0.282863
# c -1.509059
# d -1.135632
# e 1.212112
# dtype: float64
s.index
#Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
pd.Series(np.random.randn(5))
# 0 -0.173215
# 1 0.119209
# 2 -1.044236
# 3 -0.861849
# 4 -2.104569
# dtype: float64
1.1.2From dict
Series can be instantiated from dicts:
python
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)
# b 1
# a 0
# c 2
# dtype: int64
1.1.3 From scalar value
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.
python
pd.Series(5.0, index=["a", "b", "c", "d", "e"])
# a 5.0
# b 5.0
# c 5.0
# d 5.0
# e 5.0
# dtype: float64
1.1.4使用方法
python
s[0]
# Out[13]: 0.4691122999071863
s[:3]
# Out[14]:
# a 0.469112
# b -0.282863
# c -1.509059
# dtype: float64
s[s > s.median()]
# Out[15]:
# a 0.469112
# e 1.212112
# dtype: float64
s[[4, 3, 1]]
# Out[16]:
# e 1.212112
# d -1.135632
# b -0.282863
# dtype: float64
np.exp(s)
# Out[17]:
# a 1.598575
# b 0.753623
# c 0.221118
# d 0.321219
# e 3.360575
# dtype: float64
s.array
# Out[19]:
# <PandasArray>
# [ 0.4691122999071863, -0.2828633443286633, -1.5090585031735124,
# -1.1356323710171934, 1.2121120250208506]
# Length: 5, dtype: float64
s.to_numpy()
# Out[20]: array([ 0.4691, -0.2829, -1.5091, -1.1356, 1.2121])
s["a"]
# Out[21]: 0.4691122999071863
s["e"] = 12.0
s
# Out[23]:
# a 0.469112
# b -0.282863
# c -1.509059
# d -1.135632
# e 12.000000
# dtype: float64
"e" in s
# Out[24]: True
"f" in s
# Out[25]: False
Series在使用方面与ndarry , dict非常相似,可以使用Series.to_numpy()
转化为numpy.
DataFrame
类似于二维的表格
1. 导入数据
参考资料: [1]https://pandas.pydata.org/pandas-docs/stable/user_guide