平时会用到的python代码,用到python代码,今天因为某个原因再次写p


今天因为某个原因再次写python脚本,像我这种懒鬼,能让机器做到的事很少自己去动手。算起来我也算是写了很多python脚本了,大部分都是文本分析之类的,因此提取了一些常用的函数出来

基于Python3.2,在分析大量的文本时会用到的函数

PS:GetFileFromThisRootDir 之前是大小写敏感的,现在修改

__author__ = 'soso_fy'#codeing:utf-8# 写python脚本经常要用到的一些函数# 免得每次都重写蛋疼# require python 3.2 or laterimport osimport codecs# 读取文本文件函数,支持bom-utf-8,utf-8,utf-16,gbk,gb2312# 返回文件内容def ReadTextFile(filepath):    try:        file = open(filepath, 'rb')    except IOError as err:        print('读取文件出错 in ReadFile', err)    bytes = file.read()    file.close()    if bytes[:3] == codecs.BOM_UTF8:        content = bytes[3:].decode('utf-8')    else:        try:            content = bytes.decode('gb2312')        except UnicodeDecodeError as err:            try:                content = bytes.decode('utf-16')            except UnicodeDecodeError as err:                try:                    content = bytes.decode('utf-8')                except UnicodeDecodeError as err:                    try:                        content = bytes.decode('gbk')                    except UnicodeDecodeError as err:                        content = ''                        print('不支持此种类型的文本文件编码', err)    return content# 获取指定路径下所有指定后缀的文件# dir 指定路径# ext 指定后缀,链表&不需要带点或者不指定。例子:['xml', 'java']def GetFileFromThisRootDir(dir,ext = None):    allfiles = []    needExtFilter = (ext != None)    if needExtFilter:        ext = list(map(lambda x:x.lower(), ext))    for root,dirs,files in os.walk(dir):        for filespath in files:            filepath = os.path.join(root, filespath).lower()            extension = os.path.splitext(filepath)[1][1:]            if needExtFilter and extension in ext:                allfiles.append(filepath)            elif not needExtFilter:                allfiles.append(filepath)    return allfiles#该片段来自于http://byrx.net

评论关闭