python学习笔记(四):pandas基础,,pandas 基础s
pandas 基础
serise
import pandas as pdfrom pandas import Series, DataFrameobj = Series([4, -7, 5, 3])obj
0 41 -72 53 3dtype: int64
obj.values
array([ 4, -7, 5, 3], dtype=int64)
obj.index
RangeIndex(start=0, stop=4, step=1)
obj[[1,3]]# 跳着选取数据
1 -73 3dtype: int64
obj[1:3]
1 -72 5dtype: int64
pd.isnull(obj)
0 False1 False2 False3 Falsedtype: bool
reindex可以用来插值
obj.reindex(range(5), method = 'ffill')
0 41 -72 53 34 3dtype: int64
标签切片是闭区间的
dataframe
data = {'state': ['asd','qwe','sdf','ert'], 'year': [2000, 2001, 2002, 2003], 'pop': [1.5,1.7,3.6,2.4]}data = DataFrame(data)data
| pop | state | year |
0 | 1.5 | asd | 2000 |
1 | 1.7 | qwe | 2001 |
2 | 3.6 | sdf | 2002 |
3 | 2.4 | ert | 2003 |
data.year# 比r里提取列要方便点
0 20001 20012 20023 2003Name: year, dtype: int64
data['debt'] = range(4)data
| pop | state | year | debt |
0 | 1.5 | asd | 2000 | 0 |
1 | 1.7 | qwe | 2001 | 1 |
2 | 3.6 | sdf | 2002 | 2 |
3 | 2.4 | ert | 2003 | 3 |
index是不能修改的
a = data.indexa[1] = 6
---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-9-57677294f950> in <module>() 1 a = data.index----> 2 a[1] = 6F:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value) 1668 1669 def __setitem__(self, key, value):-> 1670 raise TypeError("Index does not support mutable operations") 1671 1672 def __getitem__(self, key):TypeError: Index does not support mutable operations
data.columns
Index(['pop', 'state', 'year', 'debt'], dtype='object')
.ix标签索引功能,输入行和列不加.ix只能选取其中的某列或某行,不能列与行同时选取
data[:3]
| pop | state | year | debt |
0 | 1.5 | asd | 2000 | 0 |
1 | 1.7 | qwe | 2001 | 1 |
2 | 3.6 | sdf | 2002 | 2 |
data.ix[:,:3]
| pop | state | year |
0 | 1.5 | asd | 2000 |
1 | 1.7 | qwe | 2001 |
2 | 3.6 | sdf | 2002 |
3 | 2.4 | ert | 2003 |
删除某列用drop,axis = 0表示行,1表示列删除后原数据不变
data.drop(0,axis=0)
| pop | state | year | debt |
1 | 1.7 | qwe | 2001 | 1 |
2 | 3.6 | sdf | 2002 | 2 |
3 | 2.4 | ert | 2003 | 3 |
data.drop('year', axis=1)
| pop | state | debt |
0 | 1.5 | asd | 0 |
1 | 1.7 | qwe | 1 |
2 | 3.6 | sdf | 2 |
3 | 2.4 | ert | 3 |
data
| pop | state | year | debt |
0 | 1.5 | asd | 2000 | 0 |
1 | 1.7 | qwe | 2001 | 1 |
2 | 3.6 | sdf | 2002 | 2 |
3 | 2.4 | ert | 2003 | 3 |
import numpy as npdf = DataFrame(np.arange(9).reshape(3, 3))df
| 0 | 1 | 2 |
0 | 0 | 1 | 2 |
1 | 3 | 4 | 5 |
2 | 6 | 7 | 8 |
applymap()可以对dataframe每一个元素运用函数apply()可以对每一维数组运用函数
df.applymap(lambda x: '%.2f' % x)
| 0 | 1 | 2 |
0 | 0.00 | 1.00 | 2.00 |
1 | 3.00 | 4.00 | 5.00 |
2 | 6.00 | 7.00 | 8.00 |
data.sort_values(by='pop')# 对某一列排序
| pop | state | year | debt |
0 | 1.5 | asd | 2000 | 0 |
1 | 1.7 | qwe | 2001 | 1 |
3 | 2.4 | ert | 2003 | 3 |
2 | 3.6 | sdf | 2002 | 2 |
data.describe()
| pop | year | debt |
count | 4.000000 | 4.000000 | 4.000000 |
mean | 2.300000 | 2001.500000 | 1.500000 |
std | 0.948683 | 1.290994 | 1.290994 |
min | 1.500000 | 2000.000000 | 0.000000 |
25% | 1.650000 | 2000.750000 | 0.750000 |
50% | 2.050000 | 2001.500000 | 1.500000 |
75% | 2.700000 | 2002.250000 | 2.250000 |
max | 3.600000 | 2003.000000 | 3.000000 |
df.isin([1])
| 0 | 1 | 2 |
0 | False | True | False |
1 | False | False | False |
2 | False | False | False |
None、NaN会被当作NA处理df.shape不加括号相当于dim()
df.shape
(3, 3)
dropna删除缺失值
df.ix[:1, :1] = Nonedf
| 0 | 1 | 2 |
0 | NaN | NaN | 2 |
1 | NaN | NaN | 5 |
2 | 6.0 | 7.0 | 8 |
填充缺失值可以调用字典,不同行添加不同值
df.fillna({0:11, 1:22})
| 0 | 1 | 2 |
0 | 11.0 | 22.0 | 2 |
1 | 11.0 | 22.0 | 5 |
2 | 6.0 | 7.0 | 8 |
df
| 0 | 1 | 2 |
0 | NaN | NaN | 2 |
1 | NaN | NaN | 5 |
2 | 6.0 | 7.0 | 8 |
df.fillna({0:11, 1:22}, inplace=True)
| 0 | 1 | 2 |
0 | 11.0 | 22.0 | 2 |
1 | 11.0 | 22.0 | 5 |
2 | 6.0 | 7.0 | 8 |
df
| 0 | 1 | 2 |
0 | 11.0 | 22.0 | 2 |
1 | 11.0 | 22.0 | 5 |
2 | 6.0 | 7.0 | 8 |
inplace修改对象不产生副本
python学习笔记(四):pandas基础
评论关闭