抓取网上的小说章节并写入txt文件，抓取网上章节txt,[Python]代码im

文章由Byrx.net分享于2019-03-23 08:03:57评论（571）

抓取网上的小说章节并写入txt文件，抓取网上章节txt,[Python]代码im

[Python]代码

import urllib.request as webimport re'''用来过滤出小说html中小说的章节名与内容'''def getContent(url):    http = str(web.urlopen(url).read(),encoding='GBK')    title = re.findall('&lt;h1&gt;.*?&lt;/h1&gt;',http)[0]    title = re.sub('&lt;/?h1&gt;','',title)    content = re.findall('&lt;div id="content"&gt;.*?&lt;/div&gt;',http)[0]    content = re.sub('&lt;br /&gt;','\n',content)    content = re.sub('&lt;div id="content"&gt;|&lt;/div&gt;','',content)    content = re.sub('&amp;nbsp;',' ',content)    return (title,content)'''用来获取目录页码html下的章节超链接'''def getUrlList(url):    http = str(web.urlopen(url).read(),encoding='GBK')    lis = re.findall('&lt;a.*?章.*?&lt;/a&gt;',http)    hrefs = []    for l in lis:        try:            hrefs.append(l.split('"')[1])        except:            pass    return hrefsif __name__ == '__main__':    url = '小说地址url'    f = open('e://name.txt',mode='w')    urlList = getUrlList(url)    numUrlList = []    for u in urlList[:-1]:        try:            #print(url,'  ',url[:-5])            numUrlList.append(int(u[:-5]))        except:            pass    numUrlList.sort()    for href in numUrlList:        h = url + str(href) + '.html'        print(h)        try:            c = getContent(h)        except:            try:                c = getContent(h)            except:                print('读取失败了')                continue        title,content = c        print(title,'完成')        f.write(title+'\n')        f.write(content)        f.write('\n')    print('全部完成了，ohyeah')    f.close()

热门文章：

抓取网上的小说章节并写入txt文件，抓取网上章节txt,[Python]代码im

抓取网上的小说章节并写入txt文件，抓取网上章节txt,[Python]代码im

相关内容

最新python源码实例

python~HOT