Python Requests爬取目标网页代码，求神帮看下源码，pythonrequests,在学习python爬虫过

文章由Byrx.net分享于2019-03-23 04:03:38评论（189）

Python Requests爬取目标网页代码，求神帮看下源码，pythonrequests,在学习python爬虫过

在学习python爬虫过程中
想练习爬取该网站：http://www.topit.me/的图片

可是当初次访问该网站时
网页会显示该页面:http://www.topit.me/event/warmup/welcome/views/index.html
所以每次都获取不到我想要的HTML代码

该怎么解决呢？谢谢！代码如下：

import re,requestsTopit_headers={'User-Agent':'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36'}Topit_Html=requests.get('http://www.topit.me/',headers=Topit_headers)Pic_url=re.findall('src="(.*?)" style',Topit_Html.text,re.S)print Topit_Html.cookies

Topit_headers里添加Cookie
初次访问主页，会跳转至欢迎页面，欢迎页上有 [进入网页版本] 的按钮，之后就不会再跳转了，说明按钮上有个操作设定了是否跳转的标识，查看源代码可知，设置了一个 cookie 作为标识
$.cookie('is_click' , '1',{expires: 100,path:'/',domain:'topit.me'});
所以在访问主页的时候，带上这个 cookie 即可
curl 'http://www.topit.me/' -H 'Cookie: is_click=1;'
'src="(.*?)" style' 源码没有这些内容，匹配不到的吧

编橙之家文章，

热门文章：

Python Requests爬取目标网页代码，求神帮看下源码，pythonrequests,在学习python爬虫过

Python Requests爬取目标网页代码，求神帮看下源码，pythonrequests,在学习python爬虫过

相关内容

最新python问答

python~HOT