Python正则表达式 学习笔记
正则表达式
创建正则表达式
import re
phoneNumberRegex = re.compile(r'ddd-ddd-dddd')
mo = phoneNumberRegex.search('my number is 415-483-2925')
print('Phone Number Found: ' + mo.group())
Phone Number Found: 415-483-2925
常见的匹配模式
利用括号分组
import re
phoneNumberRegex = re.compile(r'(ddd)-(ddd-dddd)')
mo = phoneNumberRegex.search('my number is 415-483-2925')
print(mo.group())
print(mo.group(1))
print(mo.group(2))
415-483-2925
415
483-2925
利用管道匹配多个分组
import re
heroRegex = re.compile(r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
mo2 = heroRegex.search('Tina Fey and Batman')
print(mo1.group())
print(mo2.group())
Batman
Tina Fey
使用问号实现可选匹配
import re
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo2 = batRegex.search('The Adventures of Batwoman')
print(mo1.group())
print(mo2.group())
Batman
Batwoman
使用星号匹配零次或多次
import re
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo1.group())
print(mo2.group())
Batman
Batwowowowowowoman
使用加号实现一次或多次匹配
import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwowoman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo1.group())
print(mo2.group())
Batwowoman
Batwowowowowowoman
使用花括号匹配特定次数
import re
batRegex = re.compile(r'Bat(wo){6}man')
mo1 = batRegex.search('The Adventures of Batwowoman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo2.group())
Batwowowowowowoman
用点-星匹配所有字符
import re
nameRegex = re.compile(r'First Name:(.*) Last Name:(.*)')
mo = nameRegex.search('First Name: Jackeroo Last Name: Liu')
print(mo.group())
First Name: Jackeroo Last Name: Liu
贪心匹配与非贪心匹配
Python中的正则表达式模式是贪心匹配,这表示在有二义的时候,他们会尽可能匹配最长的字符串。花括号的”非贪心”版本匹配尽可能匹配短的字符串,即在花括号后跟着一个问号。
import re
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())
HaHaHaHaHa
import re
greedyHaRegex = re.compile(r'(Ha){3,5}?')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())
HaHaHa
findall()
方法
regex对象的findall方法返回的是一个字符串列表
import re
phoneNumberRegex = re.compile(r'(ddd)-(ddd-dddd)')
mo = phoneNumberRegex.search('my home number is 415-483-2925, and work number is 428-243-9848')
print(mo.group())
415-483-2925
import re
phoneNumberRegex = re.compile(r'ddd-ddd-dddd')
mo = phoneNumberRegex.findall('my home number is 415-483-2925, and work number is 428-243-9848')
print(mo)
['415-483-2925', '428-243-9848']
字符分类
缩写字符分类 | 表示 |
---|---|
d | 0到9的任何数字 |
D | 除0到9的数字以外的任何字符 |
w | 任何字母、数字或下划线字符(可以认为是匹配”单词”字符) |
W | 除字母、数字和下划线以外的任何字符 |
s | 空格、制表符或换行符(可以认为是匹配“空白”字符) |
S | 除空格、制表符或换行符以外的任何字符 |
import re
xmasRegex = re.compile(r'd+sw+')
mo = xmasRegex.search('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings,4 birds,2 doves,1 partridge')
print(mo.group())
12 drummers
import re
xmasRegex = re.compile(r'd+sw+')
mo = xmasRegex.findall('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings,4 birds,2 doves,1 partridge')
for m in mo:
print(m)
12 drummers
11 pipers
10 lords
9 ladies
8 maids
7 swans
6 geese
5 rings
4 birds
2 doves
1 partridge
建立自己的字符分类
import re
vowelRegex = re.compile(r'[aeiouAEIOU]')#匹配所有元音字符
mo = vowelRegex.findall('I love China')
print(mo)
['I', 'o', 'e', 'i', 'a']
插入字符和美元字符
用法:
- ^spam:开头必须是spam
- spam$:结尾必须是spam
import re
beginWithRegex = re.compile(r'^86-d{11}')
mo = beginWithRegex.search('86-18173919192 is my Chinese Number')
print(mo.group())
86-18173919192
import re
beginWithRegex = re.compile(r'^d+$')
mo = beginWithRegex.search('18173919188')
print(mo.group())
18173919188
通配符
用点—星匹配所有字符
import re
nameRegex = re.compile(r'First Name:(.*) Last Name:(.*)')
mo = nameRegex.search('First Name: Jackeroo Last Name: Liu')
print(mo.group())
First Name: Jackeroo Last Name: Liu
用句点字符匹配换行
点-星将匹配除换行以外的所有字符。通过传入re.DOTALL作为re.compile()的第二个参数,可以让句点字符匹配所有字符,包括换行符
import re
noNewlineRegex = re.compile('.*') #匹配所有字符,直到第一个换行符
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.').group()
print(mo)
Serve the public trust,
import re
noNewlineRegex = re.compile('.*',re.DOTALL) #匹配所有字符,包括换行符
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.').group()
print(mo)
Serve the public trust,
Protect the innocent.
Uphold the law.
不区分大小写的匹配
有时候匹配只关系匹配字母,不在意大小写,要让正则不区分大小写匹配,可以向
re.compilie()
传入re.IGNORECASE
或re.I
,作为第二参数。
import re
noNewlineRegex = re.compile('serve',re.IGNORECASE)
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.')
print(mo.group())
Serve
用sub()方法替换字符串
组合使用re.IGNORECASE/re.DOTALL/re.VERBOSE
- re.IGNORECASE: 忽略大小写匹配
- re.DOTALL:匹配所有字符,包括换行符
- re.VERBOSE:编写注释,让程序忽略这些注释,通常较复杂的匹配会用到
import re
someRegex = re.compile('foot',re.IGNORECASE|re.DOTALL|re.VERBOSE)
mo = someRegex.search('Football is one of my favorite sports')
print(mo.group())
Foot
常用的正则表达式
为电话号码创建正则表达式
import re
#phoneRegex = re.compile(r'ddd-ddd-dddd')
phoneRegex = re.compile(r'''(
(d{3}|(d{3}))?
(s|-|.)?
(d{3})
(s|-|.)
(d{4})
(s*(ext|x|ext.)s*(d{2,5}))?
)''',re.VERBOSE)
mo = phoneRegex.search('My number is: 415-213-2341')
print(mo.group())
415-213-2341
为匹配邮箱创建正则表达式
import re
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+
@
[a-zA-Z0-9.-]+
(.[a-zA-Z]{2,4})
)''',re.VERBOSE)
mo = emailRegex.search('My email address is: [email protected]')
print(mo.group())
[email protected]
1. 本站所有文章和内容源于站长整理与输出,如有侵权请邮件联系站长!
2. 本站部分文章教程提供PDF版本付费下载收藏备用,网页版可免费阅读与浏览!
3. 联系站长或者加入社群,请通过顶部菜单栏加入,或者邮件联系 [email protected]
Jackeroo的个人独立博客 | Working | Life | Interests » Python正则表达式 学习笔记
Jackeroo的个人独立博客 | Working | Life | Interests » Python正则表达式 学习笔记