基于正向最大匹配法的分词处理程序,Python实现,正向python,基本思想是先取一句话的前
基于正向最大匹配法的分词处理程序,Python实现,正向python,基本思想是先取一句话的前
基本思想是先取一句话的前六个字符查词库,若不是一个词,则剔除六个字的最后一个字再查,这样下去直到找到一个词为止。对句子剩余部分重复此操作,直到把所有的词都分出来。
def FMMSplit(sentence): 'This is Forward Maximum Matching method.' MAXRANGE = 6 splitedWords = [] sentenceLength = sentence.__len__() finalPoint = sentenceLength - 1 startPoint = 0 endPoint = min(finalPoint, MAXRANGE - 1) while startPoint <= finalPoint: tempPoint = endPoint while tempPoint >= startPoint: subString = sentence[startPoint:tempPoint + 1] if ALLWORDS.has_key(subString): splitedWords.append(subString) break else: tempPoint -= 1 startPoint += 1 endPoint = endPoint + 1 if endPoint + 1 <= finalPoint else endPoint return splitedWords#该片段来自于http://byrx.net
评论关闭