Python解析RSS,pythonrss,首先需要安装RSS解析的
Python解析RSS,pythonrss,首先需要安装RSS解析的
首先需要安装RSS解析的包 下载地址:http://feedparser.googlecode.com/files/feedparser-4.1.zip
下载包之后,解压缩,切换到解压后的目录中,执行python setup.py install
安装包,即可。
下面是解析的示例代码:
>>> import feedparser>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")>>> d['feed']['title'] # feed data is a dictionaryu'Sample Feed'>>> d.feed.title # get values attr-style or dict-styleu'Sample Feed'>>> d.channel.title # use RSS or Atom terminology anywhereu'Sample Feed'>>> d.feed.link # resolves relative linksu'http://example.org/'>>> d.feed.subtitle # parses escaped HTMLu'For documentation only'>>> d.channel.description # RSS terminology works here toou'For documentation only'>>> len(d['entries']) # entries are a list1>>> d['entries'][0]['title'] # each entry is a dictionaryu'First entry title'>>> d.entries[0].title # attr-style works here toou'First entry title'>>> d['items'][0].title # RSS terminology works here toou'First entry title'>>> e = d.entries[0]>>> e.link # easy access to alternate linku'http://example.org/entry/3'>>> e.links[1].rel # full access to all Atom linksu'related'>>> e.links[0].href # resolves relative links here toou'http://example.org/entry/3'>>> e.author_detail.name # author data is a dictionaryu'Mark Pilgrim'>>> e.updated_parsed # parses all date formats(2005, 11, 9, 11, 56, 34, 2, 313, 0)>>> e.content[0].value # sanitizes dangerous HTMLu'Watch o
完整的解析程序代码:
import feedparserd = feedparser.parse('http://ued.taobao.com/blog/feed/')print d.feed.titleprint d.channel.titleprint d.feed.linkprint d.feed.subtitleprint d.channel.descriptionprint 'items length is %d' % (len(d['entries']),)print d.channel.sy_updatefrequencyprint d.channel.sy_updateperiodprint d.channel.lastbuilddatefor item in d.entries: print 'item title = %s' % (item.title,) print 'item link = %s' % (item.link,) print 'item author = %s' % (item.author,) tags = [] for tag in item.tags: tags.append(tag.term) print 'item\'s tags = %s ' % (','.join(tags),) print 'item\'s updated time = ',item.updated_parsed print 'items comments count %s' % (item.slash_comments,) contents = [] if isinstance(item.content,list): for c in item.content: contents.append(c.value) print '\r\n'.join(contents)
注意: 在实际使用中需要根据你要解析的rss feed来适当调整代码!
相关内容
- python webpy purge nginx fastcgi cache 代码实现,webpynginx,前几
- Python初学教程:Python使用os.path处理文件路径,pythonos
- Python验证Url地址的正则表达式,python正则表达式,Pytho
- Python连接mysql OperationalError: (1366, "Incorrect string
- Python使用easy_install升级包,pythoneasy_install,easy_install
- Python实现java或者.net的getHashCode()函数,pythongethashcode,
- Python正则表达式入门,python正则表达式,一. 编译正则表
- Python使用elaphe包生成二维码,pythonelaphe,首先需要安装
- Python正则表达式前向否定断言使用示例,python正则表达
- python itertools模块学习笔记,pythonitertools,python itert
评论关闭