scrapy写爬虫 却返回不出东西,scrapy写爬虫返回,我的想法是 输入一个电影


我的想法是 输入一个电影名 然后返回它的信息

# -*- coding: utf-8 -*-import syssys.path.append("..")reload(sys)sys.setdefaultencoding('utf8')from scrapy.spider import Spiderfrom scrapy.http import Requestfrom scrapy.selector import Selectorfrom scrapy.spiders import Rule,CrawlSpiderfrom items import doubanSpiderItemfrom scrapy.contrib.linkextractors import LinkExtractorclass doubanSpider(CrawlSpider):    name = 'doubanSpider'    allowed_domains=[]    start_urls = ['http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB']    def start_requests(self):        movie_name = raw_input("输入电影名:")        try:            url_head = "http://movie.douban.com/subject_search?search_text="            self.start_urls.append(url_head+str(movie_name))            for url in self.start_urls:                yield self.make_requests_from_url(url)        except:            print "can not connect"            # 获取搜索电影界面    def parse(self, response):        sel=Selector(response)        print sel        movie_link = sel.xpath("//div[@class='pl2']/a/@href/text()").extract()        print movie_link        if movie_link:             yield Request(movie_link[0],callback=self.parse_item)        #进入所搜索电影界面    def parse_item(self,response):        sel = Selector(response)        movie_name = sel.xpath("//span[@property = 'v:itemreviewed']/text()").extract()        print movie_name        

这是我的代码 下面是terminal 的反应

timmys-MacBook-Pro:spiders apple$ scrapy crawl doubanSpider/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:6: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead  from scrapy.spider import Spider/Users/apple/Desktop/doubanSpider/doubanSpider/spiders/doubanSpider.py:11: ScrapyDeprecationWarning: Module `scrapy.contrib.linkextractors` is deprecated, use `scrapy.linkextractors` instead  from scrapy.contrib.linkextractors import LinkExtractor2015-11-08 20:50:51 [scrapy] INFO: Scrapy 1.0.3 started (bot: doubanSpider)2015-11-08 20:50:51 [scrapy] INFO: Optional features available: ssl, http112015-11-08 20:50:51 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'doubanSpider.spiders', 'SPIDER_MODULES': ['doubanSpider.spiders'], 'BOT_NAME': 'doubanSpider'}2015-11-08 20:50:51 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState2015-11-08 20:50:51 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats2015-11-08 20:50:51 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware2015-11-08 20:50:51 [scrapy] INFO: Enabled item pipelines: doubanSpiderPipeline2015-11-08 20:50:51 [scrapy] INFO: Spider opened2015-11-08 20:50:51 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)2015-11-08 20:50:51 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023输入电影名:移动迷宫2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)2015-11-08 20:50:58 [scrapy] DEBUG: Crawled (200) <GET http://movie.douban.com/subject_search?search_text=%E7%A7%BB%E5%8A%A8%E8%BF%B7%E5%AE%AB> (referer: None)<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>[]<Selector xpath=None data=u'<html lang="zh-CN" class="">\n<head>\n    '>[]2015-11-08 20:50:58 [scrapy] INFO: Closing spider (finished)2015-11-08 20:50:58 [scrapy] INFO: Dumping Scrapy stats:{'downloader/request_bytes': 554, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 16250, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 11, 8, 12, 50, 58, 566941), 'log_count/DEBUG': 3, 'log_count/INFO': 7, 'response_received_count': 2, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2015, 11, 8, 12, 50, 51, 888328)}2015-11-08 20:50:58 [scrapy] INFO: Spider closed (finished)timmys-MacBook-Pro:spiders apple$  

然后是豆瓣html

编橙之家文章,

评论关闭