3-1 Pandas-概述,什么是概述
3-1 Pandas-概述,什么是概述
Pandas章节应用的数据可以在以下链接下载:
https://files.cnblogs.com/files/AI-robort/Titanic_Data-master.zip
Pandas:数据分析处理库¶
In [1]:
import pandas as pd
In [4]:
df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
.head():可以读取前几条数据,或指定前几条都可以
In [5]:df.head(6)Out[5]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
.info():返回当前的信息
In [6]:df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB
查看表格的各项属性和细节¶
In [7]:df.index#索引值的属性Out[7]:
RangeIndex(start=0, stop=891, step=1)In [8]:
df.columns#每一列的名字Out[8]:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object')In [9]:
df.dtypes#每一列的值的类型Out[9]:
PassengerId int64 Survived int64 Pclass int64 Name object Sex object Age float64 SibSp int64 Parch int64 Ticket object Fare float64 Cabin object Embarked object dtype: objectIn [10]:
df.values#每行的值Out[10]:
array([[1, 0, 3, ..., 7.25, nan, 'S'], [2, 1, 1, ..., 71.2833, 'C85', 'C'], [3, 1, 3, ..., 7.925, nan, 'S'], ..., [889, 0, 3, ..., 23.45, nan, 'S'], [890, 1, 1, ..., 30.0, 'C148', 'C'], [891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
自己创建data_frame数据
In [11]:data={'country':['aaa','bbb','ccc'],'population':[10,12,14]} df_data=pd.DataFrame(data) df_dataOut[11]:
country | population | |
---|---|---|
0 | aaa | 10 |
1 | bbb | 12 |
2 | ccc | 14 |
df_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): country 3 non-null object population 3 non-null int64 dtypes: int64(1), object(1) memory usage: 128.0+ bytesIn [15]:
age=df['Age']#搜索对应的一列 age[:5]#显示前5行数据Out[15]:
0 22.0 1 38.0 2 26.0 3 35.0 4 35.0 Name: Age, dtype: float64
series:dataframe中的一行/列
In [16]:age.indexOut[16]:
RangeIndex(start=0, stop=891, step=1)In [17]:
age.values[:5]Out[17]:
array([22., 38., 26., 35., 35.])In [18]:
df.head()Out[18]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df['Age'][:5]Out[19]:
0 22.0 1 38.0 2 26.0 3 35.0 4 35.0 Name: Age, dtype: float64
改变索引对象
In [20]:df=df.set_index('Name') df.head()Out[20]:
PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
Name | |||||||||||
Braund, Mr. Owen Harris | 1 | 0 | 3 | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
Cumings, Mrs. John Bradley (Florence Briggs Thayer) | 2 | 1 | 1 | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
Heikkinen, Miss. Laina | 3 | 1 | 3 | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
Futrelle, Mrs. Jacques Heath (Lily May Peel) | 4 | 1 | 1 | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
Allen, Mr. William Henry | 5 | 0 | 3 | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df['Age'][:5]Out[21]:
Name Braund, Mr. Owen Harris 22.0 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0 Heikkinen, Miss. Laina 26.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 Allen, Mr. William Henry 35.0 Name: Age, dtype: float64In [25]:
age=df['Age'] age[:5]Out[25]:
Name Braund, Mr. Owen Harris 22.0 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0 Heikkinen, Miss. Laina 26.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 Allen, Mr. William Henry 35.0 Name: Age, dtype: float64In [26]:
age['Allen, Mr. William Henry']#索引名字对应的值Out[26]:
35.0In [27]:
age=age+10 age[:5]Out[27]:
Name Braund, Mr. Owen Harris 32.0 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 48.0 Heikkinen, Miss. Laina 36.0 Futrelle, Mrs. Jacques Heath (Lily May Peel) 45.0 Allen, Mr. William Henry 45.0 Name: Age, dtype: float64
对值统计指标
In [28]:age.mean()Out[28]:
39.69911764705882In [29]:
age.max()Out[29]:
90.0In [30]:
age.min()Out[30]:
10.42In [31]:
df.describe()####整体一次性统计各项的指标基本统计特性Out[31]:
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
相关内容
- 暂无相关文章
评论关闭