python实现的刷博客浏览量（有待改进），,python3.4，

文章由Byrx.net分享于2019-05-30 07:05:42评论（663）

python实现的刷博客浏览量（有待改进），,python3.4，

python3.4，

使用了url.request，re ，bs4这些库，

在mooc看了很久爬虫的代码，

感觉自己可以实现这么一个贱贱的功能，

但是写完了之后访问页面是可以的，

但是浏览量并不增加。

宝宝心里苦，

感觉还要每次清空Cookie，

有空再改。

import urllib.requestimport reimport timeimport randomfrom bs4 import BeautifulSoupp = re.compile(‘/MnsterLu/p/............‘)#自己的博客主页url = "http://www.cnblogs.com/MnsterLu/"#http://www.cnblogs.com/MnsterLu/p/5532399.html#http://www.cnblogs.com/MnsterLu/p/5518372.html#让python模仿浏览器进行访问opener = urllib.request.build_opener()opener.addheaders = [(‘User-agent‘, ‘Mozilla/5.0‘)]html = opener.open(url).read().decode(‘utf-8‘)allfinds = p.findall(html)print(allfinds)urlBase = "http://www.cnblogs.com"#需要将网址合并的部分#页面中的网址有重复的，需要使用set进行去重复mypages = list(set(allfinds))for i in range(len(mypages)):    mypages[i] = urlBase+mypages[i]print(‘要刷的网页有：‘)for index , page in enumerate(mypages) :    print(str(index), page)#设置每个网页要刷的次数brushMax = 200#所有的页面都刷print(‘开始刷：‘)for index , page in enumerate(mypages) :    brushNum=random.randint(0,brushMax)    for j in range(brushNum):        try :            pageContent = opener.open(page).read().decode(‘utf-8‘)            #使用BeautifulSoup解析每篇博客的标题            soup = BeautifulSoup(pageContent)            blogTitle = str(soup.title.string)            blogTitle = blogTitle[0:blogTitle.find(‘-‘)]            print(str(j) , blogTitle)        except urllib.error.HTTPError:            print(‘urllib.error.HTTPError‘)            time.sleep(1)#出现错误，停几秒先        except urllib.error.URLError:            print(‘urllib.error.URLError‘)            time.sleep(1)#出现错误，停几秒先        time.sleep(0.5)#正常停顿，以免服务器拒绝访问

python实现的刷博客浏览量（有待改进）

热门文章：

python实现的刷博客浏览量（有待改进），,python3.4，

python实现的刷博客浏览量（有待改进），,python3.4，

相关内容

最新python教程

python~HOT