Python源码中爬虫没效果问题出在哪里,python爬虫,item.pypytho


item.py

python-*- coding: utf-8 -*import scrapyclass BokeItem(scrapy.Item):    url=scrapy.Field()    title=scrapy.Field()    content=scrapy.Field()

boke_spider.py

python-*- coding: utf-8 -*-from scrapy.contrib.spiders import CrawlSpider ,Rulefrom scrapy.contrib.linkextractors import LinkExtractorfrom boke.items import BokeItemclass BokeItem(CrawlSpider):    name = 'blog'    start_urls =['http://blog.sina.com.cn/s/blog_4701280b0102eo83.html']    def parse_torrent(self,response):        torrent=BokeItem()        torrent['url']=response.url        torrent['title']=response.xpath("//h2[@class='titName SG_txta']/text()").extract()[0]        torrent['content']=response.xpath("//div[@style='min-height:22px']/text()").extract()[0]        return  torrent

试试看看这个博客,专门针对scrapy的

试试去看看官方doc

from scrapy.contrib.spiders import CrawlSpider ,Rule

你调用的是CrawlSpider类,但是显然没有写任何的规则

建议改为Spider类,并将parse_torrent改名为parse,如下:

from scrapy.contrib.spiders import Spiderfrom boke.items import BokeItemclass BokeItem(Spider):

编橙之家文章,

评论关闭