用Python3处理文件中每个词的方法,,'''' Create
用Python3处理文件中每个词的方法,,'''' Create
'''' Created on Dec 21, 2012 处理文件中的每个词 @author: liury_lab ''' import codecs the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') for line in the_file: for word in line.split(): print(word, end = "|") the_file.close() # 若词的定义有变,可使用正则表达式 # 如词被定义为数字字母,连字符或单引号构成的序列 import re the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') print() print('************************************************************************') re_word = re.compile('[\w\'-]+') for line in the_file: for word in re_word.finditer(line): print(word.group(0), end = "|") the_file.close() # 封装成迭代器 def words_of_file(file_path, line_to_words = str.split): the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') for line in the_file: for word in line_to_words(line): yield word the_file.close() print() print('************************************************************************') for word in words_of_file('d:/text.txt'): print(word, end = '|') def words_by_re(file_path, repattern = '[\w\'-]+'): the_file = codecs.open('d:/text.txt', 'rU', 'UTF-8') re_word = re.compile('[\w\'-]+') def line_to_words(line): for mo in re_word.finditer(line): yield mo.group(0) # 原书为return,发现结果不对,改为yield return words_of_file(file_path, line_to_words) print() print('************************************************************************') for word in words_by_re('d:/text.txt'): print(word, end = '|')本文原创自www.iplaypy.com编橙之家会员:繁星123
编橙之家文章,
相关内容
- 用Python罗马数字转换为阿拉伯数字的方法,python阿拉伯
- 使用Python将数据写入MP3文件的源码详解,,编橙之家这篇
- Python设置检查点的实现方法_源码,python源码,这篇文章
- Python模拟用户自动登陆网易126邮箱源码详解,python126
- Python代码格式化CSS样式表文件源码分析,pythoncss,用Py
- Python代码实现二进制时钟的方法下载,python二进制,这是
- Python FMM算法的中文分词器实现方法源码,pythonfmm,这是
- Python3.2模拟webqq登录方法源代码分享下载,python3.2webq
- Python实现viterbi(维特比)算法原理流程是什么样的,pyt
- Python代码解决windows平台锁定鼠标键盘输入操作的方法,
评论关闭