python通过HTMLParser抓取网页上的全部链接,pythonhtmlparser,Python HTMLP


Python HTMLParser使用示例代码:

import HTMLParser, urllibclass linkParser(HTMLParser.HTMLParser):    def __init__(self):        HTMLParser.HTMLParser.__init__(self)        self.links = []    def handle_starttag(self, tag, attrs):        if tag=='a':            self.links.append(dict(attrs)['href'])htmlSource = urllib.urlopen("http://www.sharejs.com").read(200000)p = linkParser()p.feed(htmlSource)for link in p.links:    print link

评论关闭