銆怭ython銆戝瓙鍩熷悕鏌ヨ鑴氭湰,,鏍囩锛?a hre


鏍囩锛?a href='http://www.byrx.net/so/1/domain' title='domain'>domain

鑴氭湰瀛︿範锛屽鍐欏啓灏变細鍟︼紝鏉ヤ竴鍙戜釜浜虹紪鍐欑殑瓒呯骇鏃犳晫low鐨勫瓙鍩熷悕鏌ヨ鑴氭湰

#coding:utf-8import reimport requestsimport urllibimport urllib2import bs4  from bs4 import BeautifulSoup  key=raw_input("please input top domain: ")print "鏌ヨ椹笂寮€濮?.."title=[]domainlist=[]for n in xrange(1,66):    if n!=1:        n*=10            url="https://cn.bing.com/search?q=domain:"+key+"&first=%s" % n        try:        req=urllib2.Request(url)        resp=urllib2.urlopen(req).read()        #BeautifulSoup鍖归厤鏍囬        bsObj=BeautifulSoup(resp,"lxml")        getList=bsObj.find_all("h2",{"class":""})        for t in getList:            title.append(t.get_text())        #姝e垯鍖归厤瀛愬煙鍚?/span>        regex=re.compile(鈥?/span><cite>(.*?)</cite>鈥?/span>).findall(resp)        for i in regex:            domainlist.append(i.strip(鈥?/span>https://鈥?/span>).strip(鈥?/span>http://鈥?/span>).split(鈥?/span>/鈥?/span>)[0])        #鍚屾杈撳嚭鏌ヨ鍒扮殑鏍囬鍜屽瓙鍩熷悕        for (i,j) in zip(title,domainlist):            print "%-50s%-30s" % (i,j)    except Exception,e:        print e    print "鏌ヨ宸插叏閮ㄥ畬鎴?.."#鍘绘帀閲嶅鐨勫瓙鍩熷悕domainlists=list(set(domainlist))#淇濆瓨瀛愬煙鍚?/span>for line in domainlists:    with open(鈥?/span>subdomain.txt鈥?/span>,鈥?/span>a鈥?/span>) as fw:        fw.write(line+鈥?/span>\n鈥?/span>)

杩愯鎴浘锛?/p>

鎶€鏈垎浜浘鐗? src=

杩愯缁撴灉鎴浘锛?/p>

鎶€鏈垎浜浘鐗? src=

銆怭ython銆戝瓙鍩熷悕鏌ヨ鑴氭湰

鏍囩锛?a href='http://www.byrx.net/so/1/domain' title='domain'>domain

鍘熸枃鍦板潃锛歨ttps://www.cnblogs.com/peterpan0707007/p/8831183.html

评论关闭