python日常—爬取豆瓣250条电影记录,python豆瓣250条,# 感兴趣的同仁可
python日常—爬取豆瓣250条电影记录,python豆瓣250条,# 感兴趣的同仁可
# 感兴趣的同仁可以相互交流哦
import requests import lxml.html,csv doubanUrl = ‘https://movie.douban.com/top250?start={}&filter=‘def getSource(doubanUrl): response = requests.get(doubanUrl) # 获取网页 response.encoding = ‘utf-8‘ # 修改编码 return response.content #获取源码def getEveryItem(source): # 获取HTML对象 selector = lxml.html.document_fromstring(source) # 提取标签所有的信息 movieItemList = selector.xpath(‘//div[@class="info"]‘) # 定义一个空列表——用于展示信息 movieList = [] for eachMovie in movieItemList: movieDict = {} # 分层提取 title = eachMovie.xpath(‘div[@class="hd"/a/span/[@class="title"]/text()‘) otherTitle = eachMovie.xpath(‘div[@class="hd"/a/span/[@class="other"]/text()‘) link = eachMovie.xpath(‘div[@class="hd"/a/@href‘)[0] star = eachMovie.xpath(‘div[@class="hd"/div[@class="star"]/span[@class="rating_num"]/text()‘) quote = eachMovie.xpath(‘div[@class="hd"/p[@class="quote"]/span/text()‘) # 保存字典信息 movieDict[‘title‘] = ‘‘.join(title+otherTitle) movieDict[‘url‘] = link movieDict[‘star‘] = star movieDict[‘quote‘] = quote movieList.append(movieDict) return movieListdef writeData(movieList): with open(‘./Douban.csv‘,‘w‘,encoding=‘UTF-8‘,newline=‘‘) as f: writer = csv.DictWriter(f,fieldnames=[‘titlr‘,‘star‘,‘quote‘,‘url‘]) # 写入表头 writer.writeheader() for each in movieList: writer.writerow(each)if __name__ == ‘main‘: # 共展示250条电影信息 每页25条 ,共10页 movieList = [] for i in range(10): # 获取url pageLink = doubanUrl.format(i*25) print(pageLink) # 根据地址获取资源 source = getSource(pageLink) movieList = getEveryItem(source) print(movieList[:10]) writeData(movieList)
python日常—爬取豆瓣250条电影记录
相关内容
- 图像标注工具labelImg安装方法(win7+Python3.5+Qt5),labe
- Python之如何删除pandas DataFrame的某一/几列,,删除pandas
- 【Python3练习题 019】 有一分数序列:2/1,3/2,5/3,8/5,
- 用python进行对乒乓球的比赛分析,并且将该程序进行封
- Python 二级模拟操作题(八),,1. 从键盘输入一个
- Python第一次做上传文件,python上传文件,把第一次做的上
- 别小瞧Python,最近接AI的语言就属它了,pythonai,Python 是
- python enumerate() 函数,pythonenumerate,enumerate(
- Python 输入和输出,Python输入输出,一、在控制台上输入
- python2升级python3,,一 python2升
评论关闭