python统计文本字符串里面单词出现的频率,python字符串,python统计文本字符
python统计文本字符串里面单词出现的频率,python字符串,python统计文本字符
python统计文本字符串里面单词出现的频率```python
word frequency in a text
tested with Python24 vegaseat 25aug2005
Chinese wisdom ...
str1 = """Man who run in front of car, get tired.Man who run behind car, get exhausted."""print "Original string:"print str1
create a list of words separated at whitespaces
wordList1 = str1.split(None)
strip any punctuation marks and build modified word list
start with an empty list
wordList2 = []for word1 in wordList1: # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordList2.append(word2.lower())
print "Word list created from modified string:"print wordList2
create a wordfrequency dictionary
start with an empty dictionary
freqD2 = {}for word2 in wordList2: freqD2[word2] = freqD2.get(word2, 0) + 1
create a list of keys and sort the list
all words are lower case already
keyList = freqD2.keys()keyList.sort()
print "Frequency of each word in the word list (sorted):"for key2 in keyList: print "%-10s %d" % (key2, freqD2[key2])```
相关内容
- python爬虫-爬取代理IP并通过多线程快速验证,python爬虫
- linux下python抓屏小程序,linuxpython抓屏,#!/usr/bin/p
- python插入排序算法,python排序算法,插入排序的基本概念
- Python删除同一个文件夹下的重复文件代码,python代码
- 匹配IP和匹配域名,匹配IP匹配域名,class JianKo
- python读取ini配置文件,pythonini配置文件,#!/usr/bin/p
- python编写的用于测试网站访问速率的代码片段,,pytho
- python精简版搜索引擎,python精简版,from html.pa
- Python获取上一个月的天数,Python获取天数,[Python]代码
- C#设置热键隐藏指定窗口的代码,,C#设置热键隐藏指定
评论关闭