Ubuntu12.04 urllib2打印时乱码问题，ubuntu12.04urllib2,代码如下：# -*- e

文章由Byrx.net分享于2019-03-23 08:03:14评论（89）

Ubuntu12.04 urllib2打印时乱码问题，ubuntu12.04urllib2,代码如下：# -*- e

代码如下：

# -*- encoding=utf-8 -*- import urllib2 import sys content = urllib2.urlopen('http://www.douban.com').read() type = sys.getfilesystemencoding() print content print content.decode("UTF-8").encode(type)

打印content的内容就是一堆乱七八糟的东西：��}isI��w��,U�$��i��o�tOL��{_��)
(��b��q+٭}o˖e��M��E�7!�Eܟb�U��ᝬ*�Ul$��V@-�'3�~2O�--�_��?�~��-�CD��tyt��6}��xܣ��,��0+0��Y��6�t�c

然后decode的时候又报错：UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

系统环境是Ubuntu1204, Python2,7 ,这究竟是什么问题呢？
先谢谢了！

首先你的代码在我这里运行数据显示正常。不过根据你的输出来看，应该不是抓取到网页的编码错误（里面连ascii字符都没有，一般的乱码英文字符还是可以显示出来的）。猜测可能是你的源码文件编码错误。
另外，代码中说明源码文件编码，用coding：
# -*- coding=utf-8 -*-
还有type是内置类型，一般不要把变量名命名为python自带的类型名。
可能是压缩了，看下头里面是不是有 Content-Encoding:xxx
如果是压缩了，需要手动解压，urllib是不会帮你解压的
def unzip(self,data): import gzip import StringIO data = StringIO.StringIO(data) gz = gzip.GzipFile(fileobj=data) data = gz.read() gz.close() return data
很奇怪的问题，你用 logging 打下能否正常输出中文。
# -*- encoding=utf-8 -*- import loggingimport urllib2 import sys logging.basicConfig(level=logging.INFO)content = urllib2.urlopen('http://www.douban.com').read() logging.info(type(content)) logging.info(content.decode('utf-8'))

编橙之家文章，

热门文章：

Ubuntu12.04 urllib2打印时乱码问题，ubuntu12.04urllib2,代码如下：# -*- e

Ubuntu12.04 urllib2打印时乱码问题，ubuntu12.04urllib2,代码如下：# -*- e

相关内容

最新python问答

python~HOT