Python Unicode以utf-8保存文件报错如何解决，pythonutf-8,方法1def get_h

文章由Byrx.net分享于2019-03-23 06:03:29评论（9）

Python Unicode以utf-8保存文件报错如何解决，pythonutf-8,方法1def get_h

方法1

def get_html(url):    try:        headers = {            "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",            "Connection":"keep-alive",        }        r = requests.get(url,headers = headers)        return r.text    except Exception,ex:        return None

问题：requests会自动替解码成Unicode吧，但是，为什么如果直接将上述函数返回的结果写入以utf-8的格式写入文件(write_file)的时候，经常会报错（常见的编解码错误）。

html=get_html("xxxxx")write_file("a.html",html)

def write_file(file_name,content,mode="wb",encoding="utf-8"):    with codecs.open(file_name,mode=mode) as f:        f.write(content.encode(encoding,"ignore"))

但是，如果使用以下代码的话，就不会报错：

方法2

def get_html(url):    try:        headers = {            "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",            "Connection":"keep-alive",        }        r = requests.get(url,headers = headers)        return r.text.encode(r.encoding,"ignore")    except Exception,ex:        return Nonehtml=get_html("xxxxx")html = html.decode(chardet.detect(html)["encoding"],'ignore')write_file("a.html",html)

首先，先使用get_html返回编码过的字符串（什么编码不清楚），接下来使用chardet对html解码成Unicode。
此时的Unicode直接通过write_file写入文件的时候，是可以正常保存为utf-8无BOM的。
但是使用方法1的话，时常会报错。
问题1：同样是Unicode，为什么在出现上述2中不同的情况？
问题2：在不使用chardet的情况下，还有什么解决办法吗（主要是chardet判断字符编码的时候很慢）

编橙之家文章，

热门文章：

Python Unicode以utf-8保存文件报错如何解决，pythonutf-8,方法1def get_h

Python Unicode以utf-8保存文件报错如何解决，pythonutf-8,方法1def get_h

相关内容

最新python问答

python~HOT