用python将语料转化为可计算的形式,python语料,自然语言处理领域 用途:


自然语言处理领域 用途:用python将语料转化为可计算的形式

1.[用python将语料转化为可计算的形式代码]语料向量化

#-*- coding:utf-8 -*-#语料向量化表示方法#以下代码参考 Natural Language Processing with Python 一书# www.iplaypy.comfeatures = ['春天','冬天','雪','温暖']                     #抽取的特征(用来表示文档的具有代表性的词语)neg_tweetList = [['我','爱','春天'],['最','喜欢','春天']]   #积极情感语料示例pos_tweetList = [['我','喜欢','冬天'],['最','爱','冬天']]   #消极情感语料示例feature_dict ={}                                          #特征词典for i in range(len(features)):   feature_dict[i] = features[i]   documents=([(tweet, '-1') for tweet in neg_tweetList]+                [(tweet, '1') for tweet in pos_tweetList])vectorList=[]for tweetPolarity in documents:   tweet = ' '.join(i.decode('utf-8') for i in tweetPolarity[0])   word_id_presence_dict={}   for word in features:       index_id = features.index(word)      if word in tweetPolarity[0]:          word_id_presence_dict[index_id]=1   category, vector = tweetPolarity[-1], word_id_presence_dict   vectorDict = {}   vectorDict[tweet] = category, vector   vectorList.append(vectorDict)   print vectorList

编橙之家文章,

评论关闭