python爬虫 selenium 抓取 今日头条(ajax异步加载),,from selen
python爬虫 selenium 抓取 今日头条(ajax异步加载),,from selen
from selenium import webdriverfrom lxml import etreefrom pyquery import PyQuery as pqimport timedriver = webdriver.Chrome()driver.maximize_window()driver.get(‘https://www.toutiao.com/‘)driver.implicitly_wait(10)driver.find_element_by_link_text(‘科技‘).click()driver.implicitly_wait(10)for x in range(3): js="var q=document.documentElement.scrollTop="+str(x*500) driver.execute_script(js) time.sleep(2)time.sleep(5)page = driver.page_sourcedoc = pq(page)doc = etree.HTML(str(doc))contents = doc.xpath(‘//div[@class="wcommonFeed"]/ul/li‘)print(contents)for x in contents: title = x.xpath(‘div/div[1]/div/div[1]/a/text()‘) if title: title = title[0] with open(‘toutiao.txt‘,‘a+‘,encoding=‘utf8‘)as f: f.write(title+‘\n‘) print(title) else: pass
python爬虫 selenium 抓取 今日头条(ajax异步加载)
相关内容
- Python3鍩虹 print %d 杈撳嚭鏁存暟,,鏍囩锛?a hre
- Python 列表排序,,2019-08-25
- Python 模拟postman上传文件,,最近工作需求:写的程
- Python之面向对象(七)异常处理,,6.10 异常处理程
- Python print() 函数,,Pythonprin
- python winrm 连接windows,,最近遇到项目需要使用
- 深入浅出通信原理(Python代码版),,深入浅出通信原理
- Python访问MongoDB,并且转换成Dataframe,,#!/usr/bin
- python读取txt文件以空行作为数据的切分处理,,先举个例
- Python--格式化cookie为字典类型,,import req
评论关闭