Python Scrapy重写函数调用不成功,有源码求分析,pythonscrapy,环境Python:2.7
Python Scrapy重写函数调用不成功,有源码求分析,pythonscrapy,环境Python:2.7
环境
Python:2.7.6(64位)
Scrapy:0.22.2(64位)
操作系统:Windows7(64位)
问题需求
默认情况下,使用ImagePipeline组件下载图片的时候,图片名称是以图片URL的SHA1值进行保存的。
如:
图片URL:http://www.example.com/image.jpg
SHA1结果:3afec3b4765f8f0a07b78f98c07b83f013567a0a
则图片名称:3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg
但是,我想要以原来的图片名称进行保存,比如上面例子中的图片保存到本地的话,图片名称就应该是:image.jpg
stackoverflow上说是可以重写image_key函数,不过我试了下,结果发现不行,重写的image_key函数没被调用。后面查看了下ImagePipeline的源码:
class ImagesPipeline(FilesPipeline): """Abstract pipeline that implement the image thumbnail generation logic """ MEDIA_NAME = 'image' MIN_WIDTH = 0 MIN_HEIGHT = 0 THUMBS = {} DEFAULT_IMAGES_URLS_FIELD = 'image_urls' DEFAULT_IMAGES_RESULT_FIELD = 'images'...省略def file_path(self, request, response=None, info=None): ## start of deprecation warning block (can be removed in the future) def _warn(): from scrapy.exceptions import ScrapyDeprecationWarning import warnings warnings.warn('ImagesPipeline.image_key(url) and file_key(url) methods are deprecated, ' 'please use file_path(request, response=None, info=None) instead', category=ScrapyDeprecationWarning, stacklevel=1) # check if called from image_key or file_key with url as first argument if not isinstance(request, Request): _warn() url = request else: url = request.url # detect if file_key() or image_key() methods have been overridden if not hasattr(self.file_key, '_base'): _warn() return self.file_key(url) elif not hasattr(self.image_key, '_base'): _warn() return self.image_key(url) ## end of deprecation warning block image_guid = hashlib.sha1(url).hexdigest() # change to request.url after deprecation return 'full/%s.jpg' % (image_guid) # deprecated def image_key(self, url): return self.file_path(url) image_key._base = True...省略
其中,有这么一句话:
ImagesPipeline.image_key(url) and file_key(url) methods are deprecated, please use file_path(request, response=None, info=None) instead
也就是说,在最新版本的Scrapy中(0.22.2),使用file_path代替image_key函数。
因此,我在自定义的ImagePipeline类中,重写了file_path函数,但是结果运行的时候,发现也没法被调用。
代码如下:
from scrapy.contrib.pipeline.images import ImagesPipelinefrom scrapy.exceptions import DropItemfrom scrapy.http import Requestimport osclass DownPhotosPipeline(ImagesPipeline): def file_path(self, request): print "~~~~~~~~~~~~~~~~~~~~~~" print "~~~~~~~"+request.url+"~~~~~~~" print "~~~~~~~~~~~~~~~~~~~~~~" image_guid = request.url.split('/')[-1] return 'full/%s' % (image_guid) def get_media_requests(self, item, info): for image_url in item['images']: yield Request(image_url) def item_completed(self, results, item, info): image_paths = [x['path'] for ok, x in results if ok] if not image_paths: raise DropItem("Item contains no images") #item['image_paths'] = image_paths return item
settings.py
DOWNLOAD_DELAY = 2IMAGES_STORE = 'budejie_photos'DOWNLOAD_TIMEOUT = 1200ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
def file_path(self, request):
改成
def file_path(self, request, response=None, info=None):
就可以了,在file_path函数中return图片名称就可以了
编橙之家文章,
相关内容
- pyhton2.7 sublime text2配置 OS X环境,pyhton2.7sublime,在谷歌看
- Python如何求N维点集的中点方法,pythonn维中点,rectangle
- Django不修改源码如何扩展User model字段,djangomodel,默认情
- Python esay_install报AttributeError何解?linux,,我运行easy_ins
- Python类装饰器TypeError错误,pythontypeerror,a = Question
- 为什么PIL只有8位BMP灰度图数据无法修改,pilbmp,im = Im
- pyspider第三方库 数据库redis ES 混用可否?,pyspiderredi
- python3 通过bottle获取请求参数中文乱码,python3bottle,通过
- python正则VERBOSE的工作原理是什么,pythonverbose,re模块的
- ipython交互模式下换行问题,ipython换行,ipython 和 py
评论关闭