scrapy piplines下载图片速度很慢怎么优化,,没有用文档中所说的图片管
scrapy piplines下载图片速度很慢怎么优化,,没有用文档中所说的图片管
没有用文档中所说的图片管道来实现,为什么我这么实现就很慢呢?
以下附上代码:
# -*- coding: utf-8 -*-# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.htmlimport osimport urllib2import requestsclass boboCrawlPipeline(object): def __init__(self): self.f = open('data.txt', 'w+') self.browse_headers ={'User-Agent':'Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0'} if not os.path.exists('av'): os.makedirs('av') os.chdir('av') def process_item(self, item, spider): title = item['headTitle'][0].split('-')[0].encode('gbk').rstrip() dirname = title.decode('gbk') if not os.path.exists(dirname): os.makedirs(dirname) self.f.write(title+'\n') for img in set(item['imgurl']): self.down_link(img, dirname + '/' + os.path.basename(img)) #self.f.write(img+'\n') return item def down_link(self,url, filename, istorrent = 0): forumurl = "http://38.103.161.185" if os.path.exists(filename) and os.path.getsize(filename) > 0: #TODO MD5 return if url.find('attachments/month')>=0: #如果是本论坛的图片则补全地址 url = forumurl + "/forum/" + url elif url.find('attachments/day')>=0: #如果是本论坛的图片则补全地址 url = forumurl + "/forum/" + url #print("+++++++%s+++"%url) attempts = 0 while attempts < 10: try: req = requests.Session()#新建连接来下载图片 save_html = req.get(url,headers=self.browse_headers,timeout=10) if save_html.content == None: return f=open(filename, "wb").write(save_html.content) f.close() break except Exception as e: attempts += 1 #self.log(e) #self.log(filename +"||"+ url) return def close_spider(self,spider): self.f.close()
编橙之家文章,
相关内容
- Python实时画图横轴时间轴标识function怎样实现,pythonf
- Python getattr类反射传参数怎么写啊,pythongetattr,请教一个
- Python上传后的图片大小会改变吗?,,headers = {f
- Python图片质量差如何高效识别文字?,,最近需要识别大
- Python正则匹配文本并获取文本前N行内容的方法?,pyth
- 请高手看看这段python代码中赋值引用问题是什么问题,
- 文中源码的线程和进程有本质区别是什么?傻傻分不清
- Python装饰器这种用法可以吗,Python装饰器用法,class Ev
- Python sdk文档中尖括号如何处理,pythonsdk文档括号,比如
- 为什么读取通过cookielib方法得到的cookies返回值是空的
评论关闭