抓取知乎日报内容在本地另存为txt文档,日报txt,#Filename:ge
文章由Byrx.net分享于2019-03-23 05:03:59
抓取知乎日报内容在本地另存为txt文档,日报txt,#Filename:ge
#Filename:getZhihu.pyimport re,osimport urllib2from bs4 import BeautifulSoupimport sysimport timereload(sys)sys.setdefaultencoding("utf-8")def getHtml(url): header={'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1','Referer' : '******'} request=urllib2.Request(url,None,header) response=urllib2.urlopen(request) text=response.read() return textdef mkDir(): date=time.strftime('%Y-%m-%d',time.localtime(time.time())) os.mkdir(str(date))def saveText(text): date=time.strftime('%Y-%m-%d',time.localtime(time.time())) dir_name="/home/wang/Documents/py/Zhihu/"+date soup=BeautifulSoup(text)# i=1# for i in soup.h2:# i=i+1 if soup.h2.get_text()=='': filename=dir_name+"/ad.txt" fp=file(filename,'w') content=soup.find('div',"content") content=content.get_text() fp.write(content) fp.close()# elif i > 1:# filename=dir_name+"/kiding.txt"# contents=soup.findAll('div',"content")+soup.findAll("div","question")# contents=contents.get_text()# fp=file(filename,'w')# fp.write(contents)# fp.close() else: filename=dir_name+"/"+soup.h2.get_text()+".txt" fp=file(filename,'w') content=soup.find('div',"content") content=content.get_text() fp.write(content) fp.close()# print content #testdef getUrl(url): html=getHtml(url) # print html soup=BeautifulSoup(html) urls_page=soup.find('div',"post-body")# print urls_page urls=re.findall('"((http)://.*?)"',str(urls_page)) return urls def main(): mkDir() page="http://zhihudaily.ahorn.me" urls=getUrl(page) for url in urls: text=getHtml(url[0]) saveText(text)if __name__=="__main__": main()
相关内容
- 最美应用爬虫,应用爬虫,import reque
- 获取网页上图片的下载地址,获取图片下载地址,#!/us
- python和bash统计CPU利用率,pythonbash利用率,#!/usr/bin/p
- python 爬小说我梦见了欲望,,import urlli
- 多任务多线程任务管理类,多任务多线程管理类,#codi
- 每行数据重复N次合并生成新文件,数据重复n合并,""
- Python用户定义类实现,python用户定义,class Worker
- Python模拟http get,python模拟get,import sys,
- python的get set方法示例,pythongetset示例,class Critte
- Python布隆过滤器实现代码,python布隆过滤器,之前用py
评论关闭