问BeautifulSoup获得标签描述信息python源码怎么写,,

    <dd         isStop = "1" class='isStop'        matchcode="201409066001"        matchnumcn ="周六001"        starttime = "1409994000000"        endtime ="1409993820000"             isattention = "0"        hostname="北九州" guestname="福冈黄蜂"        leagueid = "533"        hostteamid = "46148"        visitteamid = "12193"        matchid="1000817"        leagueName="J2联赛"        class="league_533"style="display: none;"        ishot="0"        >pass</dd>

比如我想获取的是:

style="display: none;"

这个字段的none~如何获取呢?

上代码:

#! /usr/bin/env python# -*- coding: utf-8 -*-tag_content = """<dd    isStop = "1" class='isStop'    matchcode="201409066001"    matchnumcn ="周六001"    starttime = "1409994000000"    endtime ="1409993820000"    isattention = "0"    hostname="北九州" guestname="福冈黄蜂"    leagueid = "533"    hostteamid = "46148"    visitteamid = "12193"    matchid="1000817"    leagueName="J2联赛"    class="league_533"    style="display: none;"    ishot="0">pass</dd>"""from bs4 import BeautifulSouptag_soup = BeautifulSoup(tag_content)style_str = tag_soup.dd["style"]print style_str.split(":")[1].lstrip()[:-1]

Beautiful Soup不能直接获得“none”,不过我们能容易地得到display: none;,然后用python很容易处理了。

用tag attrs["style"] 然后正则

1.如果python的cgi中能有专门获取html中style或者属性的方法最好,这style 既没有id name 也不是value。不知道能不能get出来
2,我的超级笨办法我的思路:把这一大块用'''包裹,之后,另开一个py文件,用open打开刚才的要检索的大块,用readlins()去读取那大块中的每一行,用正则匹配出 style=“dispaly:”,之后再用str的切片 切出来。

s = """  <dd    isStop = "1" class='isStop'    matchcode="201409066001"    matchnumcn ="周六001"    starttime = "1409994000000"    endtime ="1409993820000"     isattention = "0"    hostname="北九州" guestname="福冈黄蜂"    leagueid = "533"    hostteamid = "46148"    visitteamid = "12193"    matchid="1000817"    leagueName="J2联赛"    class="league_533"    style="display: none;"    ishot="0"    >pass</dd>"""from pyquery import PyQueryp = PyQuery(s)a=p("dd")print a.attr('style')print a.attr('hostname')

display: none;
北九州

编橙之家文章,

评论关闭