python抓取网页及网页上所有连接的演示代码,python抓取,import urlli
文章由Byrx.net分享于2019-03-23 08:03:15
python抓取网页及网页上所有连接的演示代码,python抓取,import urlli
import urllib, htmllib, formatter, re, sysurl = sys.argv[1]website = urllib.urlopen("http://"+url)data = website.read()website.close()format = formatter.AbstractFormatter(formatter.NullWriter())ptext = htmllib.HTMLParser(format)ptext.feed(data)links = []links = ptext.anchorlistfor link in links: if re.search('http', link) != None: print(link) website = urllib.urlopen(link) data = website.read() website.close() ptext = htmllib.HTMLParser(format) ptext.feed(data) morelinks = ptext.anchorlist for alink in morelinks: if re.search('http', alink) != None: links.append(alink)
相关内容
- 查单词的脚本,单词脚本,#!/usr/bin/p
- Python xml和xsl转换为html,pythonxmlxslhtml,用的libxml2,所以
- 创建并修改excel,创建修改excel,[Python]代码#创
- python分页类,python分页,python分页类#co
- 简单验证码识别,验证码识别,get_CAPTCHA.
- 下载豆瓣音乐小站歌曲,豆瓣小站歌曲,[Python]代码#!
- python采集百度百科名片,,[Python]代码#!
- S先生与P先生谜题,谜题,[Python]代码de
- 对mysqldb的一个简单封装,mysqldb简单封装,对于python-my
- 多线程,限制线程数运行,,多线程限制线程数,[Python]代
评论关闭