利用爬虫技术爬取率土之滨的武将数据,


闲暇之余有玩《率土之滨》这个游戏,感觉还不错,想做个对战模拟器,查查官网有啥数据可以用发现只有基本的武将数据可用,而且还没有武将的成长数据。算了能爬啥就爬啥数据。。。

以下是代码,新手玩python请多多指教,python版本是2.7

# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup          # For processing HTML
import urllib2
import sys
import re
reload(sys)
sys.setdefaultencoding('utf-8')
class heroInfo:
    def __init__(self):
        self.heroName = ''
        self.heroCost = ''
        self.herobingzhong=''
        self.herojuli = ''
        self.heromoulue = ''
        self.herogongji = ''
        self.herogongcheng = ''
        self.herofangyu = ''
        self.herosudu = ''
        self.herojineng=''
        self.heroother=''


for i in range(646):
    temp = ''
    if i+1 < 10:
        temp = "00"
    if i+1 < 100 and i+1 >= 10:
        temp = "0"
    if i+1 >= 100:
        temp = ""
    url = temp+str(i+1)
    #print url
    r=''
    try:
        page = urllib2.urlopen("http://stzb.163.com/herolist/100"+url+".html")
        r = page.read()
        r = r.decode('gbk')
    except urllib2.URLError, err:
        print err
        continue
    soup = BeautifulSoup(r)
    content = soup.find(name='div',attrs={'class':'role-content'})
    heroName = content.h1.text
    herolist=[]
    hinfo = heroInfo()
    hinfo.heroName = heroName
    herolist.append(hinfo)
    nextsoup=BeautifulSoup(str(content))

    grouplist=nextsoup.findAll(name='dl',attrs={'class':'group'})
    i=0
    for item in grouplist:
        if i==0:
            hinfo.herojineng= item.dd.text
        else:
            hinfo.heroother= item.dd.text
        #print item.dd.text
        i=i+1

    spanlist=nextsoup.findAll('span')
    for item in spanlist:
        if 'cost' in item.text:
            hinfo.heroCost = item.text
            #print item.text
        if '兵种' in item.text:
            hinfo.herobingzhong=item.text
            #print item.text
        if '攻击距离' in item.text:
            hinfo.herojuli=item.text
        if '谋略' in item.text:
            hinfo.heromoulue=item.text
        if '初始攻击' in item.text:
            hinfo.herogongji=item.text
        if '初始攻城' in item.text:
            hinfo.herogongcheng=item.text
        if '防御' in item.text:
            hinfo.herofangyu=item.text
        if '速度' in item.text:
            hinfo.herosudu=item.text
            #print item.text
    print hinfo.heroName+','+hinfo.herobingzhong+','+hinfo.heroCost+','+hinfo.herojineng+','+hinfo.herogongji+','+hinfo.heromoulue+','+hinfo.herosudu+','+hinfo.herogongcheng+','+hinfo.herojineng+','+hinfo.heroother

  爬下来的数据稍有瑕疵,因网易貌似某些武将数据删除了网页实际武将只有大约430个左右。理论上装了BeautifulSoup就能直接运行,喜欢的可以拿去一试

相关内容

    暂无相关文章

评论关闭