简单的电子邮件爬虫Python代码，邮件爬虫python代码,import reque

文章由Byrx.net分享于2019-03-23 05:03:33评论（507）

简单的电子邮件爬虫Python代码，邮件爬虫python代码,import reque

import requestsimport retry:    from urllib.parse import urljoinexcept ImportError:    from urlparse import urljoin# regexemail_re = re.compile(r'([\w\.,]+@[\w\.,]+\.\w+)')link_re = re.compile(r'href="(.*?)"')def crawl(url):    result = set()    req = requests.get(url)    # Check if successful    if(req.status_code != 200):        return []    # Find links    links = link_re.findall(req.text)    print("\nFound {} links".format(len(links)))    # Search links for emails    for link in links:        # Get an absolute URL for a link        link = urljoin(url, link)        # Find all emails on current page        result.update(email_re.findall(req.text))    return resultif __name__ == '__main__':    emails = crawl('http://www.realpython.com')    print("\nScrapped e-mail addresses:")    for email in emails:        print(email)    print("\n")

热门文章：

简单的电子邮件爬虫Python代码，邮件爬虫python代码,import reque

简单的电子邮件爬虫Python代码，邮件爬虫python代码,import reque

相关内容

最新python源码实例

python~HOT