Python Scrapy重写函数调用不成功,有源码求分析，pythonscrapy,环境Python：2.7

文章由Byrx.net分享于2019-03-23 08:03:45评论（568）

Python Scrapy重写函数调用不成功,有源码求分析，pythonscrapy,环境Python：2.7

环境

Python：2.7.6(64位)
Scrapy：0.22.2(64位)
操作系统：Windows7(64位)

问题需求

默认情况下，使用ImagePipeline组件下载图片的时候，图片名称是以图片URL的SHA1值进行保存的。
如：
图片URL:http://www.example.com/image.jpg
SHA1结果：3afec3b4765f8f0a07b78f98c07b83f013567a0a
则图片名称：3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg
但是，我想要以原来的图片名称进行保存，比如上面例子中的图片保存到本地的话，图片名称就应该是：image.jpg
stackoverflow上说是可以重写image_key函数，不过我试了下，结果发现不行，重写的image_key函数没被调用。后面查看了下ImagePipeline的源码：

class ImagesPipeline(FilesPipeline):    """Abstract pipeline that implement the image thumbnail generation logic    """    MEDIA_NAME = 'image'    MIN_WIDTH = 0    MIN_HEIGHT = 0    THUMBS = {}    DEFAULT_IMAGES_URLS_FIELD = 'image_urls'    DEFAULT_IMAGES_RESULT_FIELD = 'images'...省略def file_path(self, request, response=None, info=None):        ## start of deprecation warning block (can be removed in the future)        def _warn():            from scrapy.exceptions import ScrapyDeprecationWarning            import warnings            warnings.warn('ImagesPipeline.image_key(url) and file_key(url) methods are deprecated, '                          'please use file_path(request, response=None, info=None) instead',                          category=ScrapyDeprecationWarning, stacklevel=1)        # check if called from image_key or file_key with url as first argument        if not isinstance(request, Request):            _warn()            url = request        else:            url = request.url        # detect if file_key() or image_key() methods have been overridden        if not hasattr(self.file_key, '_base'):            _warn()            return self.file_key(url)        elif not hasattr(self.image_key, '_base'):            _warn()            return self.image_key(url)        ## end of deprecation warning block        image_guid = hashlib.sha1(url).hexdigest()  # change to request.url after deprecation        return 'full/%s.jpg' % (image_guid)    # deprecated    def image_key(self, url):        return self.file_path(url)    image_key._base = True...省略

其中，有这么一句话：
ImagesPipeline.image_key(url) and file_key(url) methods are deprecated, please use file_path(request, response=None, info=None) instead
也就是说，在最新版本的Scrapy中（0.22.2），使用file_path代替image_key函数。
因此，我在自定义的ImagePipeline类中，重写了file_path函数，但是结果运行的时候，发现也没法被调用。
代码如下：

from scrapy.contrib.pipeline.images import ImagesPipelinefrom scrapy.exceptions import DropItemfrom scrapy.http import Requestimport osclass DownPhotosPipeline(ImagesPipeline):    def file_path(self, request):        print "~~~~~~~~~~~~~~~~~~~~~~"        print "~~~~~~~"+request.url+"~~~~~~~"        print "~~~~~~~~~~~~~~~~~~~~~~"        image_guid = request.url.split('/')[-1]        return 'full/%s' % (image_guid)    def get_media_requests(self, item, info):        for image_url in item['images']:            yield Request(image_url)    def item_completed(self, results, item, info):        image_paths = [x['path'] for ok, x in results if ok]        if not image_paths:            raise DropItem("Item contains no images")        #item['image_paths'] = image_paths        return item

settings.py

DOWNLOAD_DELAY = 2IMAGES_STORE = 'budejie_photos'DOWNLOAD_TIMEOUT = 1200ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']

def file_path(self, request):
改成
def file_path(self, request, response=None, info=None):
就可以了，在file_path函数中return图片名称就可以了

编橙之家文章，

热门文章：

Python Scrapy重写函数调用不成功,有源码求分析，pythonscrapy,环境Python：2.7