Python解析RSS,pythonrss,首先需要安装RSS解析的


首先需要安装RSS解析的包 下载地址:http://feedparser.googlecode.com/files/feedparser-4.1.zip

下载包之后,解压缩,切换到解压后的目录中,执行python setup.py install 安装包,即可。

下面是解析的示例代码:

>>> import feedparser>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")>>> d['feed']['title']             # feed data is a dictionaryu'Sample Feed'>>> d.feed.title                   # get values attr-style or dict-styleu'Sample Feed'>>> d.channel.title                # use RSS or Atom terminology anywhereu'Sample Feed'>>> d.feed.link                    # resolves relative linksu'http://example.org/'>>> d.feed.subtitle                 # parses escaped HTMLu'For documentation only'>>> d.channel.description          # RSS terminology works here toou'For documentation only'>>> len(d['entries'])              # entries are a list1>>> d['entries'][0]['title']       # each entry is a dictionaryu'First entry title'>>> d.entries[0].title             # attr-style works here toou'First entry title'>>> d['items'][0].title            # RSS terminology works here toou'First entry title'>>> e = d.entries[0]>>> e.link                         # easy access to alternate linku'http://example.org/entry/3'>>> e.links[1].rel                 # full access to all Atom linksu'related'>>> e.links[0].href                # resolves relative links here toou'http://example.org/entry/3'>>> e.author_detail.name           # author data is a dictionaryu'Mark Pilgrim'>>> e.updated_parsed              # parses all date formats(2005, 11, 9, 11, 56, 34, 2, 313, 0)>>> e.content[0].value             # sanitizes dangerous HTMLu'Watch o

完整的解析程序代码:

import feedparserd = feedparser.parse('http://ued.taobao.com/blog/feed/')print d.feed.titleprint d.channel.titleprint d.feed.linkprint d.feed.subtitleprint d.channel.descriptionprint 'items length is %d' % (len(d['entries']),)print d.channel.sy_updatefrequencyprint d.channel.sy_updateperiodprint d.channel.lastbuilddatefor item in d.entries:    print 'item title = %s' % (item.title,)    print 'item link = %s' % (item.link,)    print 'item author = %s' % (item.author,)    tags = []    for tag in item.tags:        tags.append(tag.term)    print 'item\'s tags = %s ' % (','.join(tags),)    print 'item\'s updated time = ',item.updated_parsed    print 'items comments count %s' % (item.slash_comments,)    contents = []    if isinstance(item.content,list):        for c in item.content:            contents.append(c.value)    print '\r\n'.join(contents)

注意: 在实际使用中需要根据你要解析的rss feed来适当调整代码!

评论关闭