[Python实战04]文件读取和split方法的使用


之前我们使用到的数据都是自己在控制台进行定义的,在Python处理数据时,很多情况下要多一些文件中的数据进行整理,所以就很有必要对一个文件进行读取,下面我们就看看如果使用Python去读取一个文件。

 

首先准备一个文件,等会我们会对这个文件进行读取,这里我把文件放到D:\python\file下,文件名为:sketch.txt,文件的内容如下:

 

Man: Is this the right room for an argument?
Other Man: I've told you once.
Man: No you haven't!
Other Man: Yes I have.
Man: When?
Other Man: Just now.
Man: No you didn't!
Other Man: Yes I did!
Man: You didn't!
Other Man: I'm telling you, I did!
Man: You did not!
Other Man: Oh I'm sorry, is this a five minute argument, or the full half hour?
Man: Ah! (taking out his wallet and paying) Just the five minutes.
Other Man: Just the five minutes. Thank you.
Other Man: Anyway, I did.
Man: You most certainly did not!
Other Man: Now let's get one thing quite clear: I most definitely told you!
Man: Oh no you didn't!
Other Man: Oh yes I did!
Man: Oh no you didn't!
Other Man: Oh yes I did!
Man: Oh look, this isn't an argument!
(pause)
Other Man: Yes it is!
Man: No it isn't!
(pause)
Man: It's just contradiction!
Other Man: No it isn't!
Man: It IS!
Other Man: It is NOT!
Man: You just contradicted me!
Other Man: No I didn't!
Man: You DID!
Other Man: No no no!
Man: You did just then!
Other Man: Nonsense!
Man: (exasperated) Oh, this is futile!!
(pause)
Other Man: No it isn't!
Man: Yes it is!

 

 

读取文件

下面我们首先要做的工作是定位python shell到我们文件所在的文件夹下,这样才能找到我们的文件,通过以下的命令进行定位:

>>> import os
>>> os.getcwd()
'D:\\Python33'
>>> os.chdir('d:\\python\\file')
>>> os.getcwd()
'd:\\python\\file'
>>> 
好了,这里我们注意首先导入了os包,然后通过os.chdir()方法定位到我们需要的位置,然后确认下就可以了,这里我定位到了d:\python\file文件夹下了。

 

 

然后我们就可以尝试的去读取文件的内容了,这里我们先一行一行的去读,如下:

>>> data = open('sketch.txt')
>>> print(data.readline(),end='')
Man: Is this the right room for an argument?
>>> print(data.readline(),end='')
Other Man: I've told you once.
>>> 
这里我们尝试着读取了其中的两行,在python中还提供了一些其他的方法,比如:回到文件的开头处,如下:

 

 

>>> data.seek(0)
0
>>>
然后我们可以通过for循环来输出文件中所有的内容:
>>> for each_line in data:
	print(each_line,end='')

	
Man: Is this the right room for an argument?
Other Man: I've told you once.
Man: No you haven't!
Other Man: Yes I have.
Man: When?
Other Man: Just now.
Man: No you didn't!
Other Man: Yes I did!
Man: You didn't!
Other Man: I'm telling you, I did!
Man: You did not!
Other Man: Oh I'm sorry, is this a five minute argument, or the full half hour?
Man: Ah! (taking out his wallet and paying) Just the five minutes.
Other Man: Just the five minutes. Thank you.
Other Man: Anyway, I did.
Man: You most certainly did not!
Other Man: Now let's get one thing quite clear: I most definitely told you!
Man: Oh no you didn't!
Other Man: Oh yes I did!
Man: Oh no you didn't!
Other Man: Oh yes I did!
Man: Oh look, this isn't an argument!
(pause)
Other Man: Yes it is!
Man: No it isn't!
(pause)
Man: It's just contradiction!
Other Man: No it isn't!
Man: It IS!
Other Man: It is NOT!
Man: You just contradicted me!
Other Man: No I didn't!
Man: You DID!
Other Man: No no no!
Man: You did just then!
Other Man: Nonsense!
Man: (exasperated) Oh, this is futile!!
(pause)
Other Man: No it isn't!
Man: Yes it is!
>>> 
这样我们就把文件中所有的内容读取处理了,最后别忘了关闭当前打开的文件:
>>> data.close()
>>> 

 

数据处理

 

读取到数据以后,我们就可以对数据进行简单的处理了,我们可以观察到以上的数据,是两个人的对话,格式如下:

Man: Is this the right room for an argument?
每个人要说的内容之前都有一个冒号,我们可以根据这个冒号进行拆分出说话的人和这个人说话的内容,这个拆分函数为:split,我们可以根据冒号进行拆分,代码如下:

 

 

each_line.split(":")
根据冒号会把一句话拆分成两个部分,我们使用一个列表进行保存,如下:
(role,line_spoke) = each_line.split(":")
这样对应的人的信息就保存到了role,而说话的内容就保存到了line_spoke,我们改写以上的for循环,如下:
>>> data = open("sketch.txt")
>>> for each_line in data:
	(role,line_spoke) = each_line.split(":")
	print(role,end='')
	print(' said: ',end='')
	print(line_spoke,end='')

	
Man said:  Is this the right room for an argument?
Other Man said:  I've told you once.
Man said:  No you haven't!
Other Man said:  Yes I have.
Man said:  When?
Other Man said:  Just now.
Man said:  No you didn't!
Other Man said:  Yes I did!
Man said:  You didn't!
Other Man said:  I'm telling you, I did!
Man said:  You did not!
Other Man said:  Oh I'm sorry, is this a five minute argument, or the full half hour?
Man said:  Ah! (taking out his wallet and paying) Just the five minutes.
Other Man said:  Just the five minutes. Thank you.
Other Man said:  Anyway, I did.
Man said:  You most certainly did not!
Traceback (most recent call last):
  File "", line 2, in 
    (role,line_spoke) = each_line.split(":")
ValueError: too many values to unpack (expected 2)
>>> 
这里可以观察到这数据读取中出现了错误,错误是在读取Man said: You most certainly did not!的下一行时出现的,我们看一下这行数据数据是什么,如下:
Other Man: Now let's get one thing quite clear: I most definitely told you!
这样我们看到了,这句话里面有多个冒号,而我们是根据冒号进行拆分的,这样就会把这句话拆分成三部分,而我们定义的列表只有两个变量,所以这里报了个too many values to unpack (expected 2)的错误,我们怎样进行修改呢?

 

我们想要的肯定是根据人后面出现的第一个冒号进行拆分,但是怎样表示呢?这里我们首先查看一下split的官方文档,看有没有对应的解决方案:

>>> help(each_line.split)
Help on built-in function split:

split(...)
    S.split(sep=None, maxsplit=-1) -> list of strings
    
    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.

>>> 
这里我们可以看到有个参数是maxsplit,这个参数可以指定对将要拆分的字符串拆分成几份,这里我们把其设置为1就会把字符串拆分成2份,满足我们的要求,所以我们修改上面的输出代码,如下:
data = open('sketch.txt')

for each_line in data:
    (role,line_spoke) = each_line.split(":",1)
    print(role,end='')
    print(' said: ',end='')
    print(line_spoke,end='')


data.close()
然后执行,如下:
>>> ================================ RESTART ================================
>>> 
Man said:  Is this the right room for an argument?
Other Man said:  I've told you once.
Man said:  No you haven't!
Other Man said:  Yes I have.
Man said:  When?
Other Man said:  Just now.
Man said:  No you didn't!
Other Man said:  Yes I did!
Man said:  You didn't!
Other Man said:  I'm telling you, I did!
Man said:  You did not!
Other Man said:  Oh I'm sorry, is this a five minute argument, or the full half hour?
Man said:  Ah! (taking out his wallet and paying) Just the five minutes.
Other Man said:  Just the five minutes. Thank you.
Other Man said:  Anyway, I did.
Man said:  You most certainly did not!
Other Man said:  Now let's get one thing quite clear: I most definitely told you!
Man said:  Oh no you didn't!
Other Man said:  Oh yes I did!
Man said:  Oh no you didn't!
Other Man said:  Oh yes I did!
Man said:  Oh look, this isn't an argument!
Traceback (most recent call last):
  File "D:\python\file\sketch.py", line 4, in 
    (role,line_spoke) = each_line.split(":",1)
ValueError: need more than 1 value to unpack
>>> 
这次又出现了错误,但已经不是之前的那个错误了,这个错误是出现在Man said: Oh look, this isn't an argument!这句话之后的,我们查看一下sketch.txt,找到这句话,如下:
(pause)
可以发现这句话里面并没有冒号,而我们要冒号进行拆分,所以就出现了错误,所以我们在进行拆分之前应该首先查看下当前的行中是否有冒号,我们可以根据python提供的一个叫做find的函数查找当前行中是否有冒号,使用方法如下:
>>> each_line = "Hello World"
>>> each_line.find(":")
-1
>>> each_line = "Man:Hello!"
>>> each_line.find(":")
3
>>> 
可以发现当找不到冒号时会返回-1,如果找到的话则返回冒号所在的位置。这样的话我们就修改之前的代码,如下:
data = open('sketch.txt')

for each_line in data:
    if not each_line.find(":")==-1:
        
        (role,line_spoke) = each_line.split(":",1)
        print(role,end='')
        print(' said: ',end='')
        print(line_spoke,end='')


data.close()
在if中进行判断时我们加入了not进行取反,运行结果如下:

 

 

>>> ================================ RESTART ================================
>>> 
Man said:  Is this the right room for an argument?
Other Man said:  I've told you once.
Man said:  No you haven't!
Other Man said:  Yes I have.
Man said:  When?
Other Man said:  Just now.
Man said:  No you didn't!
Other Man said:  Yes I did!
Man said:  You didn't!
Other Man said:  I'm telling you, I did!
Man said:  You did not!
Other Man said:  Oh I'm sorry, is this a five minute argument, or the full half hour?
Man said:  Ah! (taking out his wallet and paying) Just the five minutes.
Other Man said:  Just the five minutes. Thank you.
Other Man said:  Anyway, I did.
Man said:  You most certainly did not!
Other Man said:  Now let's get one thing quite clear: I most definitely told you!
Man said:  Oh no you didn't!
Other Man said:  Oh yes I did!
Man said:  Oh no you didn't!
Other Man said:  Oh yes I did!
Man said:  Oh look, this isn't an argument!
Other Man said:  Yes it is!
Man said:  No it isn't!
Man said:  It's just contradiction!
Other Man said:  No it isn't!
Man said:  It IS!
Other Man said:  It is NOT!
Man said:  You just contradicted me!
Other Man said:  No I didn't!
Man said:  You DID!
Other Man said:  No no no!
Man said:  You did just then!
Other Man said:  Nonsense!
Man said:  (exasperated) Oh, this is futile!!
Other Man said:  No it isn't!
Man said:  Yes it is!
>>> 
可以发现,根据我们的规则,所有的数据全部都展示出来了。



 

评论关闭