利用爬虫技术爬取率土之滨的武将数据,
利用爬虫技术爬取率土之滨的武将数据,
闲暇之余有玩《率土之滨》这个游戏,感觉还不错,想做个对战模拟器,查查官网有啥数据可以用发现只有基本的武将数据可用,而且还没有武将的成长数据。算了能爬啥就爬啥数据。。。
以下是代码,新手玩python请多多指教,python版本是2.7
# -*- coding: utf-8 -*- from BeautifulSoup import BeautifulSoup # For processing HTML import urllib2 import sys import re reload(sys) sys.setdefaultencoding('utf-8') class heroInfo: def __init__(self): self.heroName = '' self.heroCost = '' self.herobingzhong='' self.herojuli = '' self.heromoulue = '' self.herogongji = '' self.herogongcheng = '' self.herofangyu = '' self.herosudu = '' self.herojineng='' self.heroother='' for i in range(646): temp = '' if i+1 < 10: temp = "00" if i+1 < 100 and i+1 >= 10: temp = "0" if i+1 >= 100: temp = "" url = temp+str(i+1) #print url r='' try: page = urllib2.urlopen("http://stzb.163.com/herolist/100"+url+".html") r = page.read() r = r.decode('gbk') except urllib2.URLError, err: print err continue soup = BeautifulSoup(r) content = soup.find(name='div',attrs={'class':'role-content'}) heroName = content.h1.text herolist=[] hinfo = heroInfo() hinfo.heroName = heroName herolist.append(hinfo) nextsoup=BeautifulSoup(str(content)) grouplist=nextsoup.findAll(name='dl',attrs={'class':'group'}) i=0 for item in grouplist: if i==0: hinfo.herojineng= item.dd.text else: hinfo.heroother= item.dd.text #print item.dd.text i=i+1 spanlist=nextsoup.findAll('span') for item in spanlist: if 'cost' in item.text: hinfo.heroCost = item.text #print item.text if '兵种' in item.text: hinfo.herobingzhong=item.text #print item.text if '攻击距离' in item.text: hinfo.herojuli=item.text if '谋略' in item.text: hinfo.heromoulue=item.text if '初始攻击' in item.text: hinfo.herogongji=item.text if '初始攻城' in item.text: hinfo.herogongcheng=item.text if '防御' in item.text: hinfo.herofangyu=item.text if '速度' in item.text: hinfo.herosudu=item.text #print item.text print hinfo.heroName+','+hinfo.herobingzhong+','+hinfo.heroCost+','+hinfo.herojineng+','+hinfo.herogongji+','+hinfo.heromoulue+','+hinfo.herosudu+','+hinfo.herogongcheng+','+hinfo.herojineng+','+hinfo.heroother
爬下来的数据稍有瑕疵,因网易貌似某些武将数据删除了网页实际武将只有大约430个左右。理论上装了BeautifulSoup就能直接运行,喜欢的可以拿去一试
相关内容
- 暂无相关文章
评论关闭