Linux 安装python爬虫框架 scrapy，pythonscrapy,Linux 安装py

文章由Byrx.net分享于2019-05-30 07:05:30评论（168）

Linux 安装python爬虫框架 scrapy，pythonscrapy,Linux 安装py

Linux 安装python爬虫框架 scrapy

http://scrapy.org/

Scrapy是python最好用的一个爬虫框架.要求: python2.7.x.

1. Ubuntu14.04

1.1 测试是否已经安装pip

    # pip --version

如果没有pip，安装:

    # sudo apt-get install python-pip

1.2 然后安装scrapy

Import the GPG key used to sign Scrapy packages into APT keyring:

    $ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 627220E7

Create /etc/apt/sources.list.d/scrapy.list file using the following command:

    $ echo ‘deb http://archive.scrapy.org/ubuntu scrapy main‘ | sudo tee /etc/apt/sources.list.d/scrapy.list

Update package lists and install the scrapy package:

    $ sudo apt-get update && sudo apt-get install scrapy    $ pip install service_identity --timeout 10000

Install pyasn1-0.1.8:

   $ wget https://pypi.python.org/packages/source/p/pyasn1/pyasn1-0.1.8.tar.gz#md5=7f6526f968986a789b1e5e372f0b7065   $ tar -zxvf pyasn1-0.1.8.tar.gz   $ cd pyasn1-0.1.8   $ sudo python setup.py install

2. RHEL6.4

2.1 安装pip

# wget "https://pypi.python.org/packages/source/p/pip/pip-1.5.4.tar.gz#md5=834b2904f92d46aaa333267fb1c922bb" --no-check-certificate# tar -xzvf pip-1.5.4.tar.gz# cd pip-1.5.4# python2.7 setup.py install

2.2 然后安装scrapy

# pip install scrapy --timeout 10000

TODO: 下载太慢啦。等下载完毕再完善这里

3. 实验例子

3.1 创建一个爬虫程序stackoverflow.py

#!/usr/bin/python2.7#-*- coding: UTF-8 -*-# stackoverflow.py#import scrapyclass StackOverflowSpider(scrapy.Spider):    name = ‘stackoverflow‘    start_urls = [‘http://stackoverflow.com/questions?sort=votes‘]        def parse(self, response):        for href in response.css(‘.question-summary h3 a::attr(href)‘):            full_url = response.urljoin(href.extract())            yield scrapy.Request(full_url, callback=self.parse_question)    def parse_question(self, response):        yield {            ‘title‘: response.css(‘h1 a::text‘).extract()[0],            ‘votes‘: response.css(‘.question .vote-count-post::text‘).extract()[0],            ‘body‘: response.css(‘.question .post-text‘).extract()[0],            ‘tags‘: response.css(‘.question .post-tag::text‘).extract(),            ‘link‘: response.url,        }

3.2 运行爬虫程序

    $ scrapy runspider stackoverflow.py -o top-ques.json

3.3 把 top-ques.json 文件的内容放到

http://www.json.cn/

看看爬虫得到了什么！

enjoy it !

Linux 安装python爬虫框架 scrapy

热门文章：

Linux 安装python爬虫框架 scrapy，pythonscrapy,Linux 安装py

Linux 安装python爬虫框架 scrapy，pythonscrapy,Linux 安装py

Linux 安装python爬虫框架 scrapy

1. Ubuntu14.04

1.1 测试是否已经安装pip

1.2 然后安装scrapy

2. RHEL6.4

2.1 安装pip

2.2 然后安装scrapy

3. 实验例子

3.1 创建一个爬虫程序stackoverflow.py

3.2 运行爬虫程序

3.3 把 top-ques.json 文件的内容放到

相关内容

最新python教程

python~HOT