查询关键词在百度排名python脚本分享,python脚本,如下脚本使用urllib
查询关键词在百度排名python脚本分享,python脚本,如下脚本使用urllib
如下脚本使用urllib和urllib2以及re正则表达式模块实现查询某个关键词在指定站点的百度排名。
# -*- coding: utf-8 -*-#encoding = utf-8import urllib2import urllibimport refrom urllib import quote_plusfrom urlparse import urlparsedef get_site_word_baidu_rank(siteHost,word,maxScanPageNumber = 10,printSearchLog=False): def printLog(log): if printSearchLog: print log page = 1 pageSize = 10 siteHost = siteHost.lower() number = 0 got = False gotUrl = None searchUrl = None while True: if page == maxScanPageNumber: break searchUrl = 'http://www.baidu.com/s?wd='+quote_plus(word)+'&pn='+str((page-1)*pageSize)+'&tn=baiduhome_pg&ie=utf-8&usm=2' printLog('搜索第%d页' % (page,)) data = urllib.urlopen(searchUrl) html = data.read() itemPattern = re.compile('<h3 class="t"><a[\s]+data-click="[^"]+" href="(?P<url>[^"]+)".*?<span class="g">(?P<urldate>[^<]+)</span>') matches = itemPattern.finditer(html) number = 0 for m in matches: number += 1 urldate = m.group('urldate').strip() siteUrl = urldate[0:urldate.find(' ')] itemUrl = '%s%s' % ('http://',siteUrl) urlObject = urlparse(itemUrl) if urlObject.netloc.find(':') == -1:host = urlObject.netloc else :host = urlObject.netloc[0:urlObject.netloc.find(':')] if host.lower() == siteHost or host.lower().find('.' + siteHost) > -1: gotUrl = m.group('url') realUrlFile = urllib2.urlopen(gotUrl) gotUrl = realUrlFile.geturl() got = True break if got:break page += 1 if got: number = (page-1) * pageSize + number return (number,page,gotUrl,searchUrl) return Noneif __name__ == '__main__': words = ('程序员','内存溢出','Outofmemory','python','java') siteHost = 'byrx.net' for w in words: result = get_site_word_baidu_rank(siteHost,w,10) if result: print w + ':你的网站排在第%d位,在第%d页,排上的链接是%s,搜索页地址%s'%result else: print '未找到记录'
相关内容
- python 的IO文件操作总结,,在项目开发过程中,时常需
- Python如何查看变量占用空间大小,python变量占用空间
- error: 2006 MySQL server has gone away 解决方法,mysqlgone,今天在
- python使用正则表达式验证Email地址,pythonemail,下面的代
- 使用socket模块验证ip地址,socket模块验证ip,如下代码:
- python验证IP地址方法,python验证ip,可以使用socket模块
- python mutiprocessing 入门示例,,multiprocess
- Python 的多进程 fork,python进程fork,using_fork.p
- Python 多进程实现分析,python进程实现,Python Stand
- Python连接使用redis,python连接redis,需要下载 redis-p
评论关闭