Python re（正则表达式）模块

文章由Byrx.net分享于2019-03-22 01:03:48评论（444）

Python re（正则表达式）模块

re模块

Python可以通过re模块来实现使用正则表达式匹配字符串，我们可以通过查看~/installs/python/lib/python2.7/re.py 文件查看re提供的方法，主要使用的下面的几个接口：

l def match(pattern, string, flags=0):

"""Try toapply the pattern at the start of the string, returning

a match object, or None ifno match was found."""

return _compile(pattern,flags).match(string)

re.match从字符串的开始匹配一个模式，第一个参数是正则表达式，第二个字符串是要匹配的字符串，第三个参数是标志位，缺省为0；如果可以查找到则返回一个match对象，否则返回None。

l def search(pattern, string, flags=0):

"""Scan through string looking for a match to thepattern, returning

amatch object, or None if no match was found."""

return_compile(pattern, flags).search(string)

re.search函数在字符串内查找模式，直到找到第一个就退出，查找不到返回None，其参数和re.match一致。而与match的区别在于，match只是匹配字符串的开始，而search匹配整个字符串。

l def findall(pattern, string, flags=0):

"""Return a list of all non-overlapping matches in thestring.

If one or more groups are present in the pattern, return a

list of groups; this will be a list of tuples if the pattern

has more than one group.

Empty matches are included in the result."""

return_compile(pattern, flags).findall(string)

re.findall可以获取所有匹配的字符串，并且以list形式返回。

l def compile(pattern, flags=0):

"Compile a regular expression pattern, returning a patternobject."

return_compile(pattern, flags)

re.compile可以将一个正则表达式编译成一个正则表达式对象，可以把经常用的正则表达式编译成正则表达式对象，从而提升匹配的效率。

上面提到search()和match()方法返回match object，下面介绍下match object的属性和方法。

Matchobject

属性：

string

pos

endpos

lastindex

lastgroup

方法：

group([group1, …]):

m = re.match(r"(?P\d+)\.(\d*)",'3.14')

执行这个匹配后，m.group(0)是3.14，m.group(1)是‘3’，m.group(2)是14。

groups([default]):

groupdict([default]):

start([group]):

end([group]):

span([group]):

expand(template):

上述属性和方法的输出，group方法最常使用。

使用示例

下面就上面的一些接口进行代码测试：

Import re

for name inre.findall("name=\((.*?)\)", "phone=124567 name=(john)phone=2345678 name=(tom)"):

print "got match name %s" % name

注意这里面使用()作为分组的标识，由于要匹配的数据里面也有()并且我们需要的是里面的数据，所以需要对()进行转义，为了限制python中的贪婪匹配使用*?，保证每次尽可能匹配小的文本。

k=re.search("tm=\((.*?)\)","tt=123 tm=(abc) vm=test tm=(cba)")

if k:

print "match tm is %s, %s, %d, %d, %s, %s" % (k.group(0),k.group(1), k.pos, k.endpos, k.string, k.re.pattern)

输出：

match tmis tm=(abc), abc, 0, 32, tt=123 tm=(abc)vm=test tm=(cba), tm=\((.*?)\)

text ="JGood is abc handsome boy he is cool, clever, and so on..."

#指定一个额外的分组

m =re.search(r"\s(?P\w+)\s(?P\w+)\s", text)

if m:

print m.group(0), '\t', m.group(1), '\t',m.group(2)

print "groups %s, %s, %s, %s, %s,%s" %(m.groups(), m.lastindex, m.lastgroup,

m.groupdict().keys(), m.groupdict().values(),m.string[m.start(2):m.end(2)])

print "string span is %s, %s"%(m.span(2)) #返回(start(group), end(group))。

使用?p<>表达式指定了自定义的分组名，所以可以看到m.groupdict的结果。

输出：

is abc is abc

groups ('is','abc'), 2, sign2, ['sign1', 'sign2'], ['is', 'abc'], abc

string span is9, 12s

参考链接：

http://docs.python.org/release/2.2.3/lib/match-objects.html

http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

http://www.cnblogs.com/sevenyuan/archive/2010/12/06/1898075.html

热门文章：

Python re（正则表达式）模块

Python re（正则表达式）模块

相关内容

最新python教程

python~HOT