Python正则表达式 学习笔记

正则表达式

创建正则表达式

import re
phoneNumberRegex = re.compile(r'ddd-ddd-dddd')
mo = phoneNumberRegex.search('my number is 415-483-2925')
print('Phone Number Found: ' + mo.group())
Phone Number Found: 415-483-2925

常见的匹配模式

利用括号分组

import re
phoneNumberRegex = re.compile(r'(ddd)-(ddd-dddd)')
mo = phoneNumberRegex.search('my number is 415-483-2925')
print(mo.group())
print(mo.group(1))
print(mo.group(2))
415-483-2925
415
483-2925

利用管道匹配多个分组

import re
heroRegex = re.compile(r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
mo2 = heroRegex.search('Tina Fey and Batman')
print(mo1.group())
print(mo2.group())
Batman
Tina Fey

使用问号实现可选匹配

import re
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo2 = batRegex.search('The Adventures of Batwoman')
print(mo1.group())
print(mo2.group())
Batman
Batwoman

使用星号匹配零次或多次

import re
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo1.group())
print(mo2.group())
Batman
Batwowowowowowoman

使用加号实现一次或多次匹配

import re
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batwowoman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo1.group())
print(mo2.group())
Batwowoman
Batwowowowowowoman

使用花括号匹配特定次数

import re
batRegex = re.compile(r'Bat(wo){6}man')
mo1 = batRegex.search('The Adventures of Batwowoman')
mo2 = batRegex.search('The Adventures of Batwowowowowowoman')
print(mo2.group())
Batwowowowowowoman

用点-星匹配所有字符

import re
nameRegex = re.compile(r'First Name:(.*) Last Name:(.*)')
mo = nameRegex.search('First Name: Jackeroo Last Name: Liu')
print(mo.group())
First Name: Jackeroo Last Name: Liu

贪心匹配与非贪心匹配

Python中的正则表达式模式是贪心匹配,这表示在有二义的时候,他们会尽可能匹配最长的字符串。花括号的”非贪心”版本匹配尽可能匹配短的字符串,即在花括号后跟着一个问号。

import re
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())
HaHaHaHaHa
import re
greedyHaRegex = re.compile(r'(Ha){3,5}?')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
print(mo1.group())
HaHaHa

findall()方法

regex对象的findall方法返回的是一个字符串列表

import re
phoneNumberRegex = re.compile(r'(ddd)-(ddd-dddd)')
mo = phoneNumberRegex.search('my home number is 415-483-2925, and work number is 428-243-9848')
print(mo.group())
415-483-2925
import re
phoneNumberRegex = re.compile(r'ddd-ddd-dddd')
mo = phoneNumberRegex.findall('my home number is 415-483-2925, and work number is 428-243-9848')
print(mo)
['415-483-2925', '428-243-9848']

字符分类

缩写字符分类 表示
d 0到9的任何数字
D 除0到9的数字以外的任何字符
w 任何字母、数字或下划线字符(可以认为是匹配”单词”字符)
W 除字母、数字和下划线以外的任何字符
s 空格、制表符或换行符(可以认为是匹配“空白”字符)
S 除空格、制表符或换行符以外的任何字符
import re
xmasRegex = re.compile(r'd+sw+')
mo = xmasRegex.search('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings,4 birds,2 doves,1 partridge')
print(mo.group())
12 drummers
import re
xmasRegex = re.compile(r'd+sw+')
mo = xmasRegex.findall('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings,4 birds,2 doves,1 partridge')
for m in mo:
    print(m)
12 drummers
11 pipers
10 lords
9 ladies
8 maids
7 swans
6 geese
5 rings
4 birds
2 doves
1 partridge

建立自己的字符分类

import re
vowelRegex = re.compile(r'[aeiouAEIOU]')#匹配所有元音字符
mo = vowelRegex.findall('I love China')
print(mo)
['I', 'o', 'e', 'i', 'a']

插入字符和美元字符

用法:

  • ^spam:开头必须是spam
  • spam$:结尾必须是spam
import re
beginWithRegex = re.compile(r'^86-d{11}')
mo = beginWithRegex.search('86-18173919192 is my Chinese Number')
print(mo.group())
86-18173919192
import re
beginWithRegex = re.compile(r'^d+$')
mo = beginWithRegex.search('18173919188')
print(mo.group())
18173919188

通配符

用点—星匹配所有字符

import re
nameRegex = re.compile(r'First Name:(.*) Last Name:(.*)')
mo = nameRegex.search('First Name: Jackeroo Last Name: Liu')
print(mo.group())
First Name: Jackeroo Last Name: Liu

用句点字符匹配换行

点-星将匹配除换行以外的所有字符。通过传入re.DOTALL作为re.compile()的第二个参数,可以让句点字符匹配所有字符,包括换行符

import re
noNewlineRegex = re.compile('.*')  #匹配所有字符,直到第一个换行符
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.').group()
print(mo)
Serve the public trust, 
import re
noNewlineRegex = re.compile('.*',re.DOTALL)  #匹配所有字符,包括换行符
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.').group()
print(mo)
Serve the public trust, 
Protect the innocent. 
Uphold the law.

不区分大小写的匹配

有时候匹配只关系匹配字母,不在意大小写,要让正则不区分大小写匹配,可以向re.compilie()传入re.IGNORECASEre.I,作为第二参数。

import re
noNewlineRegex = re.compile('serve',re.IGNORECASE)
mo = noNewlineRegex.search('Serve the public trust, nProtect the innocent. nUphold the law.')
print(mo.group())
Serve

用sub()方法替换字符串

组合使用re.IGNORECASE/re.DOTALL/re.VERBOSE

  • re.IGNORECASE: 忽略大小写匹配
  • re.DOTALL:匹配所有字符,包括换行符
  • re.VERBOSE:编写注释,让程序忽略这些注释,通常较复杂的匹配会用到
import re
someRegex = re.compile('foot',re.IGNORECASE|re.DOTALL|re.VERBOSE)
mo = someRegex.search('Football is one of my favorite sports')
print(mo.group())
Foot

常用的正则表达式

为电话号码创建正则表达式

import re
#phoneRegex = re.compile(r'ddd-ddd-dddd')
phoneRegex = re.compile(r'''(
    (d{3}|(d{3}))?
    (s|-|.)?
    (d{3})
    (s|-|.)
    (d{4})
    (s*(ext|x|ext.)s*(d{2,5}))?
    )''',re.VERBOSE)
mo = phoneRegex.search('My number is: 415-213-2341')
print(mo.group())

415-213-2341

为匹配邮箱创建正则表达式

import re
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+
    @
    [a-zA-Z0-9.-]+
    (.[a-zA-Z]{2,4})
    )''',re.VERBOSE)
mo = emailRegex.search('My email address is: jasonwons@gmail.com')
print(mo.group())
jasonwons@gmail.com
1. 本站所有文章和内容源于站长整理与输出,如有侵权请邮件联系站长!
2. 本站部分文章教程提供PDF版本付费下载收藏备用,网页版可免费阅读与浏览!
3. 联系站长或者加入社群,请通过顶部菜单栏加入,或者邮件联系 jackerooliu@gmail.com.
Jackeroo的个人独立博客 | Working | Life | Interests » Python正则表达式 学习笔记

发表评论