基于正向最大匹配法的分词处理程序,Python实现,正向python,基本思想是先取一句话的前


基本思想是先取一句话的前六个字符查词库,若不是一个词,则剔除六个字的最后一个字再查,这样下去直到找到一个词为止。对句子剩余部分重复此操作,直到把所有的词都分出来。

def FMMSplit(sentence):    'This is Forward Maximum Matching method.'    MAXRANGE = 6    splitedWords = []    sentenceLength = sentence.__len__()    finalPoint = sentenceLength - 1    startPoint = 0    endPoint = min(finalPoint, MAXRANGE - 1)    while startPoint <= finalPoint:        tempPoint = endPoint        while tempPoint >= startPoint:            subString = sentence[startPoint:tempPoint + 1]            if ALLWORDS.has_key(subString):                splitedWords.append(subString)                break            else:                tempPoint -= 1                  startPoint += 1        endPoint = endPoint + 1 if endPoint + 1 <= finalPoint else endPoint    return splitedWords#该片段来自于http://byrx.net

评论关闭