用Python标准库修改搜索引擎获取结果(1)

文章由Byrx.net分享于2019-03-21 06:03:14评论（661）

用Python标准库修改搜索引擎获取结果(1)

Python标准库在长时间的使用中需要不断的学习。下面我们就看看如何才能更好的掌握相关的技术信息。希望对大家之后的使用和学习有所帮助。下面的就是想大家介绍下相关的使用方法。

我输入的关键字作为地址参数传递给某个程序，这个程序就会返回一个页面，上面包括顶部logo和搜索UI）／结果部分／底部版权信息部分），我们要得到的就是中间结果部分，这个可以用Python标准库的urllib中的urlopen方法得到整个页面的字符串，然后再解析这些字符串，完全有办法把中间结果部分抽取出来，抽出着串字符串，加上自己的头部和顶部和底部，那样搜索小偷的雏形就大概完成了，下面先写个测试代码。

[code]   
# Search Thief   
# creator: Singo   
# date: 2007-8-24   
import urllib   
import re   
class SearchThief:   
" " "the google thief " " "   
global path,targetURL   
path = "pages\\ "   
# targetURL = "http://www.google.cn/search?complete=1&hl=zh-CN&q= "   
targetURL = "http://www.baidu.com/s?wd= "   
def __init__(self,key):   
self.key = key   
def getPage(self):   
webStr = urllib.urlopen(targetURL+self.key).read() # get the page string form the url   
self.setPageToFile(webStr)   
def setPageToFile(self,webStr):   
rereSetStr = re.compile( "\r ")   
self.key = reSetStr.sub( " ",self.key) # replace the string "\r "   
targetFile = file(path+self.key+ ".html ", "w ") # open the file for "w "rite   
targetFile.write(webStr)   
targetFile.close()   
print "done "   
inputKey = raw_input( "Enter you want to search --> ")   
obj = SearchThief(inputKey)   
obj.getPage()   
[/code]

这里只是要求用户输入一个关键字，然后向搜索引擎提交请求，把返回的页面保存到一个目录下，这只是一个测试的例子，如果要做真正的搜索小偷，完全可以不保存这个页面，把抽取出来的字符串加入到我们预先设计好的模板里面，直接以web的形式显示在客户端，那样就可以实现利用盗取某些搜索引擎的结果并构造新的页面呈现。

看一下百度搜索结果页的源码，在搜索结构的那个table标签前面有个 <DIV id=Div> </DIV> 的标签，我们可以根据这个标签得到下移两行的结果集，于是增加一个方法。

getResultStr()   
[code]   
def getResultStr(self,webStr):   
webStrwebStrList = webStr.read().split( "\r\n ")   
line = webStrList.index( " <DIV id=Div> </DIV> ")+2 # get the line from " <DIV id=Div> </DIV> " move 2 line   
resultStr = webStrList[line]   
return resultStr   
[/code]

既然得到结果列表，那么我们要把这个结果列表放到自己定义的页面里面，我们可以说这个页面叫模板：

[code]   
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN " "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">   
<html xmlns= "http://www.w3.org/1999/xhtml ">   
<head>   
< http-equivhttp-equiv= "Content-Type " content= "text/html; charset=gb2312 " />   
<title> SuperSingo搜索-%title% </title>   
<link href= "default/css/global.css " type=text/css rel=stylesheet>   
</head>   
<body>   
<div id= "top ">   
<div id= "logo "> <img src= "default/images/logo.jpg " /> </div>   
<div id= "searchUI ">   
<input type= "text " style= "width:300px; " />   
<input type= "submit " value= "Search " />   
</div>   
<div class= "clear "/>   
</div>   
<div id= "result_info ">   
工找到：×××条记录，耗时×××秒   
</div>   
<div id= "result "> %result% </div>   
<div id= "foot ">

这里搜索的结构全都是百度那里过来的哦！其中%title%和%result%是等待替换的字符，为了替换这些字符，我们再增加一个方法，

热门文章：

Python标准库的强大功能的相关介绍
Python标准库——走马观花(1)

用Python标准库修改搜索引擎获取结果(1)

用Python标准库修改搜索引擎获取结果(1)

相关内容

最新python教程

python~HOT