一个简单的爬虫，简单爬虫,import urlli

文章由Byrx.net分享于2019-03-23 07:03:23评论（251）

一个简单的爬虫，简单爬虫,import urlli

import urllib#读出一个URL下的a标签里href地址为.html的所有地址content = urllib.urlopen('http://www.hoopchina.com').read()s1=0while s1>=0:    begin = content.find(r'<a',s1)    m1 = content.find(r'href=',begin)    m2 = content.find(r'>',m1)    if(content[m1:m2].find(r'.html')!=-1):        m2 = content.find(r'.html',m1)        url = content[m1+6:m2+5]        print url    s1=m2#该片段来自于http://byrx.net

热门文章：

相关内容

评论关闭

最新python源码实例

python~HOT