批量转换html内的&#XXXXX;为中文,且转换文件编码为UTF-8,,lomatus#!/us
批量转换html内的&#XXXXX;为中文,且转换文件编码为UTF-8,,lomatus#!/us
lomatus
#!/usr/bin/python#coding=utf-8#Author Lomatus#Email tourszhou#gmail.comimport sys, os, re, string, iodef utoutf(htm): op = open(htm,'r') str = op.read() s = str unic = re.findall("\&#\d{5};",s) for u in unic: uni = u; num = int(u[2:7]) utf = unichr(num).encode('UTF-8') s = s.replace(uni,utf) out = s.replace("Windows-1252","UTF-8") op.close() op = open(htm,'w') op.write(out) op.close()if __name__ == "__main__": argLen = len(sys.argv) if argLen > 2 : print "Error synax" elif argLen==2: p = sys.argv[1] if re.match("^\w+\.htm",p): utoutf(p) print 'File:',p,'converted' elif os.path.exists(p): if not re.match("^\D{1}:\\\\",p): p = os.getcwd()+"\\"+sys.argv[1] print "Read Fold:"+p os.chdir(p) filelist = os.listdir(p) i = 0 for file in filelist: if re.match("^\w+\.htm",file): utoutf(file) i = i+1 print i," File:",file," Converted!"
相关内容
- python用户登陆邮件通知,python邮件通知,[Python]代码#!
- Python 压缩文件(1),python压缩文件,import os i
- python抓取百度音乐盒榜单的音乐,python抓取,[Python]代码
- python登录猫扑打卡,python猫扑打卡,[Python]代码#
- RhinoScript,,付表皮# -*- cod
- python Web 框架bottle超清晰使用范例,pythonbottle,#coding:
- 一个Python的交互式解释器,python解释器,因为在安卓上装
- 用python正则表达式提取网页的url,python正则表达式,im
- 获取上一个月最后一天的日期,获取最后一天日期,[P
- 获取上一个月第一天的日期,获取第一天日期,[Python]代
评论关闭