Pythom Scrapy框架Imagepipeline组件下载gif类型文件处理问题,,默认情况下,使用Scra
Pythom Scrapy框架Imagepipeline组件下载gif类型文件处理问题,,默认情况下,使用Scra
默认情况下,使用Scrapy的ImagePipeline组件下载图片的时候,不论之前的图片格式是png还是gif,都会被保存成jpeg格式。
通过重写file_path方法,可以将图片以原来的格式和原图片名称进行保存。
重写file_path方法
__author__ = 'Fly' #coding:utf-8 from scrapy.contrib.pipeline.images import ImagesPipeline from scrapy.http import Request from scrapy.exceptions import DropItem class MyImagesPipeline(ImagesPipeline): def file_path(self, request, response=None, info=None): image_guid = request.url.split('/')[-1] return 'full/%s' % (image_guid) def get_media_requests(self, item, info): for image_url in item['image_urls']: yield Request(image_url) def item_completed(self, results, item, info): image_paths = [x['path'] for ok, x in results if ok] if not image_paths: raise DropItem("Item contains no images") return item
运行结果
图片URL:http://www.baidu.com/1.gif
保存到本地:1.gif
但是,当打开1.gif的时候,发现原本动态的图片现在却变成静态的了。
请问,有谁知道怎么处理吗?
试着覆盖convertimage
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/pipeline/images.py#L87
__author__ = 'Fly' #coding:utf-8 from scrapy.contrib.pipeline.images import ImagesPipeline from scrapy.http import Request from scrapy.exceptions import DropItem class MyImagesPipeline(ImagesPipeline): def file_path(self, request, response=None, info=None): image_guid = request.url.split('/')[-1] return 'full/%s' % (image_guid) def get_media_requests(self, item, info): for image_url in item['image_urls']: yield Request(image_url) def item_completed(self, results, item, info): image_paths = [x['path'] for ok, x in results if ok] if not image_paths: raise DropItem("Item contains no images") return item def convert_image(self, image, size=None): buf = StringIO() image.save(buf) return image, buf
试试,可能会出错,文档上说这个pipeline会:
Convert all downloaded images to a common format (JPG) and mode (RGB)Avoid re-downloading images which were downloaded recentlyThumbnail generationCheck images width/height to make sure they meet a minimum constraint
编橙之家文章,
相关内容
- 同一个程序在python2.7.3与2.7.2环境运行会出现不同结果吗
- Python SocketServer模块代理通讯加密问题新人求指点,py
- Python处理mongodb遇到的document key相关问题,pythonmongodb,最
- 求大牛看下python源码中的__init__()作用是什么,python__
- Python怎么样提取标签内部数据解决方法,python提取,抓取
- Python新手协程异步tornado.concurrent.Future如何理解,,官方
- Python sqlalchemy返回指定字段方法,pythonsqlalchemy,sqlalche
- Flask执行定时任务要怎么写,求思路,flask思路,如何在
- sublime text编译python遇到编码UnicodeDecodeError,,[Decode err
- Django该如何定制admin,Django定制admin,如: class
评论关闭