Python 正規表現チートシート：re／regex で高精度パターン 50 連発

2025年10月10日

python

Python 正規表現チートシート：re／regex で高精度パターン 50 連発

Python で文字列処理を行う際、正規表現は避けて通れない重要なスキルです。メールアドレスの検証、ログファイルの解析、データクレンジングなど、様々な場面で正規表現の知識が求められます。

この記事では、Python の re モジュールを使った正規表現パターン 50 個を厳選してご紹介します。基本的なマッチングから実用的なバリデーション、高度なテクニックまで、実務で即使えるパターンを体系的に学んでいきましょう。

早見表：正規表現パターン 50 選クイックリファレンス

#	カテゴリ	パターン名	正規表現	用途
1	基本マッチング	完全一致	`^pattern$`	文字列全体の一致
2	基本マッチング	部分一致	`pattern`	文字列内の部分一致
3	基本マッチング	前方一致	`^pattern`	文字列の先頭一致
4	基本マッチング	後方一致	`pattern$`	文字列の末尾一致
5	基本マッチング	大文字小文字無視	`(?i)pattern`	大文字小文字を区別しない
6	文字クラス	数字のみ	`\d+`	1 文字以上の数字
7	文字クラス	英字のみ	`[a-zA-Z]+`	1 文字以上の英字
8	文字クラス	英数字のみ	`\w+`	1 文字以上の英数字とアンダースコア
9	文字クラス	空白文字	`\s+`	1 文字以上の空白
10	文字クラス	非数字	`\D+`	数字以外の文字
11	量指定子	0 回以上の繰り返し	`pattern*`	0 回以上の出現
12	量指定子	1 回以上の繰り返し	`pattern+`	1 回以上の出現
13	量指定子	0 回または 1 回	`pattern?`	オプショナルなパターン
14	量指定子	正確な N 回	`pattern{n}`	ちょうど n 回の繰り返し
15	量指定子	N 回以上	`pattern{n,}`	n 回以上の繰り返し
16	量指定子	N 回から M 回	`pattern{n,m}`	n 回以上 m 回以下の繰り返し
17	グループ化	キャプチャグループ	`(pattern)`	パターンをグループ化して抽出
18	グループ化	非キャプチャグループ	`(?:pattern)`	グループ化のみで抽出しない
19	グループ化	名前付きグループ	`(?P<name>pattern)`	名前でアクセスできるグループ
20	グループ化	後方参照	`\1, \2`	前のグループを参照
21	選択	OR 条件	`pattern1\|pattern2`	いずれかのパターン
22	境界	単語境界	`\b`	単語の境界
23	境界	非単語境界	`\B`	単語境界以外
24	先読み・後読み	肯定先読み	`(?=pattern)`	パターンが後に続く位置
25	先読み・後読み	否定先読み	`(?!pattern)`	パターンが後に続かない位置
26	先読み・後読み	肯定後読み	`(?<=pattern)`	パターンが前にある位置
27	先読み・後読み	否定後読み	`(?<!pattern)`	パターンが前にない位置
28	メール検証	メールアドレス	`^[\w\.-]+@[\w\.-]+\.\w+$`	基本的なメールアドレス形式
29	URL 検証	URL	`https?://[\w/:%#\$&\?~\.=\+\-]+`	HTTP/HTTPS URL
30	電話番号	日本の電話番号	`0\d{1,4}-?\d{1,4}-?\d{4}`	ハイフンあり/なし対応
31	電話番号	携帯電話	`0[789]0-?\d{4}-?\d{4}`	携帯電話番号
32	日付	YYYY-MM-DD	`\d{4}-\d{2}-\d{2}`	ISO 形式の日付
33	日付	YYYY/MM/DD	`\d{4}/\d{2}/\d{2}`	スラッシュ区切り日付
34	時刻	HH:MM	`\d{2}:\d{2}:\d{2}`	時刻形式
35	数値	整数	`^-?\d+$`	正負の整数
36	数値	小数	`^-?\d+\.\d+$`	小数点を含む数値
37	数値	カンマ区切り	`\d{1,3}(,\d{3})*`	3 桁カンマ区切り数値
38	IP アドレス	IPv4	`\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`	IPv4 アドレス
39	郵便番号	日本の郵便番号	`\d{3}-?\d{4}`	ハイフンあり/なし対応
40	HTML/XML	HTML タグ	`<[^>]+>`	HTML タグ全体
41	HTML/XML	特定タグの内容	`<tag>(.*?)</tag>`	タグ間の内容抽出
42	ファイルパス	ファイル拡張子	`\.([a-zA-Z0-9]+)$`	ファイル拡張子抽出
43	ファイルパス	ファイル名	`([^/\\]+)$`	パスからファイル名抽出
44	文字列処理	連続空白削除	`\s+`	複数空白を 1 つに
45	文字列処理	改行除去	`[\r\n]+`	改行文字の検出
46	パスワード	英数字記号混在	`^(?=.[A-Z])(?=.[a-z])(?=.\d)(?=.[@$!%*?&]).{8,}$`	強度チェック
47	全角・半角	全角カタカナ	`[ァ-ヶー]+`	全角カタカナのみ
48	全角・半角	半角カタカナ	`[ｦ-ﾟ]+`	半角カタカナのみ
49	全角・半角	全角ひらがな	`[ぁ-ん]+`	ひらがなのみ
50	全角・半角	漢字	`[一-龯]+`	漢字のみ

背景

Python における正規表現の重要性

Python の re モジュールは、文字列のパターンマッチングを行うための標準ライブラリです。正規表現を使うことで、複雑な文字列操作を簡潔なコードで実現できます。

従来の文字列メソッド（find()、replace()、split() など）では対応が難しい、柔軟なパターンマッチングが可能になります。例えば、「メールアドレスの形式チェック」や「URL の抽出」といった処理を数行のコードで実装できるのです。

以下の図は、Python における正規表現処理の基本フローを示しています。

mermaidflowchart LR
  input["入力文字列"] -->|パターン定義| compile["re.compile()"]
  compile -->|コンパイル済み<br/>正規表現オブジェクト| pattern["Pattern オブジェクト"]
  pattern -->|検索・マッチ| methods["match()<br/>search()<br/>findall()<br/>sub()"]
  methods -->|結果| result["Match オブジェクト<br/>または<br/>処理済み文字列"]

正規表現の基本構成要素

正規表現は以下の要素を組み合わせて構成されます。

リテラル文字: 通常の文字（例：abc）
メタ文字: 特別な意味を持つ文字（例：.、*、+）
文字クラス: 文字の集合を表す（例：[a-z]、\d）
量指定子: 繰り返しを表す（例：*、+、{n,m}）
アンカー: 位置を指定する（例：^、$、\b）

課題

正規表現を学ぶ際の一般的な課題

正規表現を習得する過程では、以下のような課題に直面することが多いです。

複雑な構文の理解が難しい

正規表現の記法は独特で、初見では理解しづらいものです。(?<=pattern) のような高度な構文になると、さらに難易度が上がります。

実用的なパターンの探索に時間がかかる

実務で必要になるパターン（メールアドレス、電話番号など）を毎回ゼロから考えるのは非効率的です。信頼できるパターン集があれば、開発速度が大きく向上します。

デバッグが困難

正規表現が期待通りに動作しない場合、どこが問題なのか特定するのが難しいという課題があります。

以下の図は、正規表現学習における典型的な課題を整理したものです。

mermaidflowchart TD
  start["正規表現の学習"] --> challenge1["課題1：複雑な構文"]
  start --> challenge2["課題2：パターン探索"]
  start --> challenge3["課題3：デバッグ困難"]

  challenge1 --> solution1["解決策：<br/>パターン集で<br/>実例から学ぶ"]
  challenge2 --> solution2["解決策：<br/>再利用可能な<br/>パターンライブラリ"]
  challenge3 --> solution3["解決策：<br/>段階的テスト<br/>と検証ツール"]

  solution1 --> goal["効率的な<br/>正規表現習得"]
  solution2 --> goal
  solution3 --> goal

解決策

50 個の厳選パターンで体系的に学ぶ

この記事では、実務で頻出する正規表現パターンを 50 個厳選し、以下のカテゴリに分類して解説します。

基本マッチング（パターン 1-5）
文字クラスと量指定子（パターン 6-16）
グループ化と参照（パターン 17-20）
選択と境界（パターン 21-23）
先読み・後読み（パターン 24-27）
実用的なバリデーション（パターン 28-50）

各パターンには、実際のコード例と解説を添えて、すぐに使える形で提供します。

Python re モジュールの主要メソッド

Python の re モジュールには、以下の主要メソッドがあります。

メソッド	説明	戻り値
`re.match()`	文字列の先頭からマッチング	Match オブジェクトまたは None
`re.search()`	文字列全体を検索して最初の一致を返す	Match オブジェクトまたは None
`re.findall()`	すべての一致を検索	マッチした文字列のリスト
`re.finditer()`	すべての一致を検索（イテレータ）	Match オブジェクトのイテレータ
`re.sub()`	パターンに一致する部分を置換	置換後の文字列
`re.split()`	パターンで文字列を分割	分割された文字列のリスト
`re.compile()`	正規表現をコンパイル	Pattern オブジェクト

具体例

カテゴリ 1：基本マッチング（パターン 1-5）

基本的なマッチングパターンから学んでいきましょう。

パターン 1：完全一致

文字列全体が特定のパターンと完全に一致するかをチェックします。

pythonimport re

python# 完全一致のパターン
pattern = r'^hello$'

python# マッチする例
text1 = "hello"
result1 = re.match(pattern, text1)
print(f"'{text1}' は一致: {result1 is not None}")  # True

python# マッチしない例
text2 = "hello world"
result2 = re.match(pattern, text2)
print(f"'{text2}' は一致: {result2 is not None}")  # False

^ は文字列の先頭、$ は文字列の末尾を表します。この 2 つで囲むことで完全一致を実現できます。

パターン 2：部分一致

文字列内に特定のパターンが含まれているかをチェックします。

python# 部分一致のパターン
pattern = r'world'

python# search() を使って文字列内を検索
text = "hello world"
result = re.search(pattern, text)

python# マッチした部分を取得
if result:
    print(f"マッチ: {result.group()}")  # world
    print(f"位置: {result.start()}-{result.end()}")  # 6-11

re.search() は文字列全体から最初の一致を探します。re.match() と異なり、先頭からのマッチングである必要はありません。

パターン 3：前方一致

文字列が特定のパターンで始まるかをチェックします。

python# 前方一致のパターン
pattern = r'^https://'

python# URLが https で始まるかチェック
urls = [
    "https://example.com",
    "http://example.com",
    "ftp://example.com"
]

python# 各URLをチェック
for url in urls:
    if re.match(pattern, url):
        print(f"✓ {url} は HTTPS です")
    else:
        print(f"✗ {url} は HTTPS ではありません")

パターン 4：後方一致

文字列が特定のパターンで終わるかをチェックします。

python# 後方一致のパターン（.py ファイルかチェック）
pattern = r'\.py$'

python# ファイル名のリスト
files = ["script.py", "data.json", "main.py", "readme.txt"]

python# Pythonファイルのみ抽出
python_files = [f for f in files if re.search(pattern, f)]
print(f"Pythonファイル: {python_files}")  # ['script.py', 'main.py']

. はメタ文字なので、リテラルのドットとして扱うには \. とエスケープする必要があります。

パターン 5：大文字小文字を無視

大文字小文字を区別せずにマッチングを行います。

python# 大文字小文字を無視するパターン
pattern = r'(?i)python'

python# または re.IGNORECASE フラグを使用
pattern_alt = r'python'
text = "I love Python and PYTHON!"

python# 方法1: インラインフラグ
matches1 = re.findall(pattern, text)
print(f"マッチ (インラインフラグ): {matches1}")  # ['Python', 'PYTHON']

python# 方法2: フラグ引数
matches2 = re.findall(pattern_alt, text, re.IGNORECASE)
print(f"マッチ (フラグ引数): {matches2}")  # ['Python', 'PYTHON']

カテゴリ 2：文字クラスと量指定子（パターン 6-16）

文字の種類と繰り返しを制御するパターンを見ていきます。

パターン 6：数字のみ

python# 数字のみのパターン
pattern = r'\d+'

python# 文字列から数字を抽出
text = "注文番号: 12345, 金額: 9800円"
numbers = re.findall(pattern, text)
print(f"抽出された数字: {numbers}")  # ['12345', '9800']

\d は [0-9] と同じ意味で、+ は 1 回以上の繰り返しを表します。

パターン 7：英字のみ

python# 英字のみのパターン
pattern = r'[a-zA-Z]+'

python# 文字列から英字のみを抽出
text = "Product123 is $49.99"
words = re.findall(pattern, text)
print(f"英字部分: {words}")  # ['Product', 'is']

パターン 8：英数字のみ

python# 英数字とアンダースコアのパターン
pattern = r'\w+'

python# ユーザー名として有効な文字列をチェック
usernames = ["user_123", "test-user", "admin@site", "valid_name"]

python# 完全に英数字とアンダースコアのみで構成されているかチェック
for username in usernames:
    if re.fullmatch(pattern, username):
        print(f"✓ {username} は有効です")
    else:
        print(f"✗ {username} は無効です")

\w は [a-zA-Z0-9_] と同等です。re.fullmatch() は文字列全体が完全に一致する場合のみマッチします。

パターン 9：空白文字

python# 空白文字のパターン
pattern = r'\s+'

python# 連続する空白を1つにまとめる
text = "This    is   a    test"
normalized = re.sub(pattern, ' ', text)
print(f"正規化後: '{normalized}'")  # 'This is a test'

\s は空白、タブ、改行などのホワイトスペース文字にマッチします。

パターン 10：非数字

python# 数字以外の文字を抽出
pattern = r'\D+'

pythontext = "ABC123DEF456"
non_digits = re.findall(pattern, text)
print(f"数字以外: {non_digits}")  # ['ABC', 'DEF']

パターン 11-16：量指定子の活用

量指定子を使って、繰り返しの回数を制御します。

python# 様々な量指定子のパターン
patterns = {
    '0回以上': r'a*',      # 0回以上の 'a'
    '1回以上': r'a+',      # 1回以上の 'a'
    '0または1回': r'a?',   # 0回または1回の 'a'
    '正確に3回': r'a{3}',  # 正確に3回の 'a'
    '3回以上': r'a{3,}',   # 3回以上の 'a'
    '2-4回': r'a{2,4}'     # 2回以上4回以下の 'a'
}

python# テストケース
test_strings = ["", "a", "aa", "aaa", "aaaa", "aaaaa"]

python# 各パターンでテスト
for name, pattern in patterns.items():
    print(f"\n【{name}】パターン: {pattern}")
    for text in test_strings:
        match = re.fullmatch(pattern, text)
        status = "○" if match else "×"
        print(f"  {status} '{text}'")

カテゴリ 3：グループ化と参照（パターン 17-20）

グループ化を使うと、パターンの一部を抽出したり、再利用したりできます。

パターン 17：キャプチャグループ

python# 日付からパーツを抽出
pattern = r'(\d{4})-(\d{2})-(\d{2})'

pythontext = "誕生日: 1990-05-15"
match = re.search(pattern, text)

python# グループごとに抽出
if match:
    year = match.group(1)   # 1番目のグループ
    month = match.group(2)  # 2番目のグループ
    day = match.group(3)    # 3番目のグループ
    print(f"年: {year}, 月: {month}, 日: {day}")
    # 年: 1990, 月: 05, 日: 15

括弧 () で囲んだ部分がキャプチャグループとなり、group(n) でアクセスできます。

パターン 18：非キャプチャグループ

python# グループ化はするが抽出はしない
pattern = r'(?:http|https)://(\w+\.\w+)'

pythontext = "https://example.com"
match = re.search(pattern, text)

python# ドメイン部分のみ抽出される
if match:
    print(f"ドメイン: {match.group(1)}")  # example.com
    # プロトコル部分(http|https)は抽出されない

(?:...) を使うと、グループ化はしますが抽出対象にはなりません。パフォーマンス向上にも役立ちます。

パターン 19:名前付きグループ

python# 名前付きグループで可読性向上
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

pythontext = "2024-03-15"
match = re.search(pattern, text)

python# 名前でアクセス可能
if match:
    print(f"年: {match.group('year')}")    # 2024
    print(f"月: {match.group('month')}")   # 03
    print(f"日: {match.group('day')}")     # 15

    # 辞書として取得することも可能
    date_dict = match.groupdict()
    print(f"辞書形式: {date_dict}")
    # {'year': '2024', 'month': '03', 'day': '15'}

パターン 20：後方参照

python# 同じ単語の繰り返しを検出
pattern = r'\b(\w+)\s+\1\b'

pythontext = "this this is a test test case"
duplicates = re.findall(pattern, text)
print(f"重複した単語: {duplicates}")  # ['this', 'test']

python# 重複を削除
fixed = re.sub(pattern, r'\1', text)
print(f"修正後: {fixed}")  # 'this is a test case'

\1 は最初のキャプチャグループ (\w+) と同じ内容を参照します。

カテゴリ 4：選択と境界（パターン 21-23）

複数のパターンから選択したり、単語の境界を利用したりするテクニックです。

パターン 21：OR 条件

python# 複数のパターンのいずれかにマッチ
pattern = r'cat|dog|bird'

pythontext = "I have a cat and a dog"
matches = re.findall(pattern, text)
print(f"見つかったペット: {matches}")  # ['cat', 'dog']

python# より複雑な例：ファイル拡張子の判定
file_pattern = r'\.(jpg|png|gif|bmp)$'
files = ["image.jpg", "photo.png", "document.pdf", "icon.gif"]

pythonimage_files = [f for f in files if re.search(file_pattern, f, re.IGNORECASE)]
print(f"画像ファイル: {image_files}")
# ['image.jpg', 'photo.png', 'icon.gif']

パターン 22：単語境界

python# 単語境界を使った完全一致
pattern = r'\bcat\b'

python# 'cat' という単語のみにマッチ（'catalog' などは除外）
texts = [
    "I have a cat",
    "The catalog is here",
    "cat is sleeping",
    "scat!"
]

pythonfor text in texts:
    if re.search(pattern, text):
        print(f"✓ '{text}' に 'cat' という単語が含まれています")
    else:
        print(f"✗ '{text}' には 'cat' という単語がありません")

\b は単語の境界（単語文字と非単語文字の間）を表します。

パターン 23：非単語境界

python# 単語の途中にあるパターンを検出
pattern = r'\Bcat\B'

pythontext1 = "concatenate"
text2 = "cat"

python# 'concatenate' の中の 'cat' にはマッチするが、単独の 'cat' にはマッチしない
print(f"'{text1}': {re.search(pattern, text1) is not None}")  # True
print(f"'{text2}': {re.search(pattern, text2) is not None}")  # False

カテゴリ 5：先読み・後読み（パターン 24-27）

先読み・後読みは、特定の条件を満たす位置を見つけるための高度なテクニックです。

mermaidflowchart LR
  position["検索位置"] --> lookahead["先読み<br/>(?=...)<br/>(?!...)"]
  position --> lookbehind["後読み<br/>(?<=...)<br/>(?<!...)"]

  lookahead --> check1["後ろに特定の<br/>パターンがあるか<br/>チェック"]
  lookbehind --> check2["前に特定の<br/>パターンがあるか<br/>チェック"]

  check1 --> match["マッチ成功"]
  check2 --> match

パターン 24：肯定先読み

python# 後ろに特定のパターンが続く位置を検出
# 例: 数字の後ろに '円' が続く数字のみを抽出
pattern = r'\d+(?=円)'

pythontext = "価格は1000円、重さは500グラムです"
prices = re.findall(pattern, text)
print(f"価格: {prices}")  # ['1000'] (500は含まれない)

パターン 25：否定先読み

python# 後ろに特定のパターンが続かない位置を検出
# 例: 後ろに '@' が続かない単語を抽出
pattern = r'\w+(?!@)'

pythontext = "user123@example.com admin password"
# '@' の直前の単語以外を抽出
matches = re.findall(pattern, text)
print(f"抽出結果: {matches}")

パターン 26：肯定後読み

python# 前に特定のパターンがある位置を検出
# 例: '$' の後ろにある数字を抽出
pattern = r'(?<=\$)\d+'

pythontext = "価格: $100, 数量: 50個"
prices = re.findall(pattern, text)
print(f"ドル価格: {prices}")  # ['100']

パターン 27：否定後読み

python# 前に特定のパターンがない位置を検出
# 例: 行頭の '#' でない '#' を検出
pattern = r'(?<!^)#(?P<tag>\w+)'

pythontext = "# タイトル\n本文 #tag1 #tag2"
tags = re.findall(pattern, text, re.MULTILINE)
print(f"ハッシュタグ: {tags}")  # ['tag1', 'tag2']

カテゴリ 6：実用的なバリデーション（パターン 28-50）

実務でよく使われる検証パターンをご紹介します。

パターン 28：メールアドレス

python# 基本的なメールアドレスの検証
pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'

pythonemails = [
    "user@example.com",      # 有効
    "test.user@domain.co.jp", # 有効
    "invalid@",              # 無効
    "@example.com",          # 無効
    "no-at-sign.com"         # 無効
]

pythonfor email in emails:
    if re.match(pattern, email):
        print(f"✓ {email} は有効です")
    else:
        print(f"✗ {email} は無効です")

パターン 29：URL

python# HTTP/HTTPS URLの検証
pattern = r'https?://[\w/:%#\$&\?\(\)~\.=\+\-]+'

pythontext = "詳細は https://example.com/page?id=123 を参照してください"
urls = re.findall(pattern, text)
print(f"抽出されたURL: {urls}")

パターン 30-31：電話番号

python# 日本の固定電話・携帯電話番号
landline_pattern = r'0\d{1,4}-?\d{1,4}-?\d{4}'
mobile_pattern = r'0[789]0-?\d{4}-?\d{4}'

pythonphone_numbers = [
    "03-1234-5678",     # 固定電話
    "0312345678",       # 固定電話（ハイフンなし）
    "090-1234-5678",    # 携帯電話
    "08012345678"       # 携帯電話（ハイフンなし）
]

pythonfor number in phone_numbers:
    if re.match(mobile_pattern, number):
        print(f"📱 {number} は携帯電話番号です")
    elif re.match(landline_pattern, number):
        print(f"☎️  {number} は固定電話番号です")

パターン 32-34：日付・時刻

python# 様々な日付形式
date_patterns = {
    'ISO形式': r'\d{4}-\d{2}-\d{2}',
    'スラッシュ': r'\d{4}/\d{2}/\d{2}',
    '和暦風': r'\d{2}\.\d{2}\.\d{2}'
}

python# 時刻の検証
time_pattern = r'\d{2}:\d{2}:\d{2}'

pythondatetime_text = "イベント日時: 2024-03-15 14:30:00"
date_match = re.search(date_patterns['ISO形式'], datetime_text)
time_match = re.search(time_pattern, datetime_text)

pythonif date_match and time_match:
    print(f"日付: {date_match.group()}")  # 2024-03-15
    print(f"時刻: {time_match.group()}")  # 14:30:00

パターン 35-37：数値

python# 様々な数値形式
patterns = {
    '整数': r'^-?\d+$',
    '小数': r'^-?\d+\.\d+$',
    'カンマ区切り': r'^\d{1,3}(,\d{3})*$'
}

pythontest_numbers = [
    ("123", "整数"),
    ("-456", "整数"),
    ("3.14", "小数"),
    ("1,000,000", "カンマ区切り"),
    ("1,23", "カンマ区切り")  # 無効
]

pythonfor number, expected_type in test_numbers:
    pattern = patterns[expected_type]
    if re.match(pattern, number):
        print(f"✓ {number} は有効な{expected_type}です")
    else:
        print(f"✗ {number} は無効な{expected_type}です")

パターン 38：IPv4 アドレス

python# IPv4アドレスの基本検証
pattern = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'

python# より厳密な検証（0-255の範囲チェック付き）
strict_pattern = r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'

pythonips = ["192.168.1.1", "255.255.255.255", "256.1.1.1", "10.0.0.256"]

pythonfor ip in ips:
    if re.match(strict_pattern, ip):
        print(f"✓ {ip} は有効なIPアドレスです")
    else:
        print(f"✗ {ip} は無効なIPアドレスです")

パターン 39：郵便番号

python# 日本の郵便番号（〒123-4567 形式）
pattern = r'\d{3}-?\d{4}'

pythonaddresses = [
    "〒100-0001 東京都千代田区",
    "郵便番号: 5500001",
    "大阪府大阪市 530-0001"
]

pythonfor address in addresses:
    match = re.search(pattern, address)
    if match:
        print(f"郵便番号: {match.group()}")

パターン 40-41：HTML/XML タグ

python# HTMLタグの抽出
tag_pattern = r'<[^>]+>'

python# 特定タグの内容を抽出
content_pattern = r'<title>(.*?)</title>'

pythonhtml = "<html><head><title>サンプルページ</title></head><body>本文</body></html>"

python# すべてのタグを抽出
tags = re.findall(tag_pattern, html)
print(f"タグ一覧: {tags}")

python# タイトルの内容を抽出
title = re.search(content_pattern, html)
if title:
    print(f"タイトル: {title.group(1)}")  # サンプルページ

.*? は非貪欲マッチングで、最短の一致を返します。

パターン 42-43：ファイルパス

python# ファイル拡張子の抽出
ext_pattern = r'\.([a-zA-Z0-9]+)$'

python# ファイル名の抽出（パスから）
filename_pattern = r'([^/\\]+)$'

pythonpaths = [
    "/Users/test/document.pdf",
    "C:\\Documents\\image.jpg",
    "script.py"
]

pythonfor path in paths:
    # 拡張子を抽出
    ext_match = re.search(ext_pattern, path)
    if ext_match:
        print(f"拡張子: .{ext_match.group(1)}")

    # ファイル名を抽出
    filename_match = re.search(filename_pattern, path)
    if filename_match:
        print(f"ファイル名: {filename_match.group(1)}")
    print("---")

パターン 44-45：文字列処理

python# 連続する空白を1つにまとめる
whitespace_pattern = r'\s+'

python# 改行文字の検出と削除
newline_pattern = r'[\r\n]+'

pythonmessy_text = "This   is\n\na    test\r\n\r\nstring"

python# 空白を正規化
normalized = re.sub(whitespace_pattern, ' ', messy_text)
print(f"空白正規化: '{normalized}'")

python# 改行を削除
no_newlines = re.sub(newline_pattern, ' ', messy_text)
print(f"改行削除: '{no_newlines}'")

パターン 46：パスワード強度チェック

python# 大文字、小文字、数字、記号を含む8文字以上
password_pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&]).{8,}$'

pythonpasswords = [
    "Pass123!",      # 有効
    "password",      # 小文字のみ（無効）
    "PASSWORD123",   # 記号なし（無効）
    "Pass123",       # 記号なし（無効）
    "Weak1!"         # 8文字未満（無効）
]

pythonfor pwd in passwords:
    if re.match(password_pattern, pwd):
        print(f"✓ '{pwd}' は強力なパスワードです")
    else:
        print(f"✗ '{pwd}' は脆弱なパスワードです")

このパターンは複数の肯定先読み (?=...) を組み合わせて、各条件を満たすかチェックしています。

パターン 47-50：全角・半角文字

python# 様々な日本語文字のパターン
japanese_patterns = {
    '全角カタカナ': r'[ァ-ヶー]+',
    '半角カタカナ': r'[ｦ-ﾟ]+',
    'ひらがな': r'[ぁ-ん]+',
    '漢字': r'[一-龯]+'
}

pythontest_text = "こんにちは、カタカナ、ｶﾀｶﾅ、漢字123"

python# 各パターンで抽出
for name, pattern in japanese_patterns.items():
    matches = re.findall(pattern, test_text)
    print(f"{name}: {matches}")

python# 全角カタカナを半角に変換する準備
# （実際の変換には unicodedata モジュールを使用）
katakana_text = "プログラミング"
if re.match(japanese_patterns['全角カタカナ'], katakana_text):
    print(f"'{katakana_text}' は全角カタカナです")

実践的な複合パターン

複数のテクニックを組み合わせた実用例をご紹介します。

ログファイルの解析

python# Apache風アクセスログのパース
log_pattern = r'(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<method>\w+) (?P<path>[^ ]+) HTTP/\d\.\d" (?P<status>\d{3}) (?P<size>\d+)'

pythonlog_line = '192.168.1.1 - - [15/Mar/2024:10:30:45 +0900] "GET /index.html HTTP/1.1" 200 1234'

pythonmatch = re.match(log_pattern, log_line)
if match:
    log_data = match.groupdict()
    print("解析結果:")
    for key, value in log_data.items():
        print(f"  {key}: {value}")

マークダウンリンクの抽出

python# [テキスト](URL) 形式のリンクを抽出
markdown_link_pattern = r'\[([^\]]+)\]\(([^\)]+)\)'

pythonmarkdown = "詳細は[公式サイト](https://example.com)と[ドキュメント](https://docs.example.com)を参照"

pythonlinks = re.findall(markdown_link_pattern, markdown)
for text, url in links:
    print(f"リンクテキスト: {text}")
    print(f"URL: {url}")
    print("---")

クレジットカード番号のマスキング

python# クレジットカード番号（4桁ずつ）を検出してマスク
cc_pattern = r'\b(\d{4})[- ]?(\d{4})[- ]?(\d{4})[- ]?(\d{4})\b'

pythontext = "カード番号: 1234-5678-9012-3456 でお支払いください"

python# 最後の4桁以外をマスク
masked = re.sub(cc_pattern, r'****-****-****-\4', text)
print(f"マスク後: {masked}")
# カード番号: ****-****-****-3456 でお支払いください

パフォーマンス最適化のテクニック

正規表現のパフォーマンスを向上させるテクニックをご紹介します。

コンパイル済みパターンの再利用

pythonimport re

python# パターンを事前にコンパイル
email_regex = re.compile(r'^[\w\.-]+@[\w\.-]+\.\w+$')

python# 大量のメールアドレスを検証する場合
emails = ["user1@example.com", "user2@test.com", "invalid@"]

python# コンパイル済みオブジェクトを再利用（高速）
valid_emails = [email for email in emails if email_regex.match(email)]
print(f"有効なメール: {valid_emails}")

コンパイルすることで、パターンの解析が 1 回で済み、繰り返し使用する場合に高速化されます。

非貪欲マッチングの活用

python# 貪欲マッチング（遅い可能性）
greedy = r'<div>.*</div>'

python# 非貪欲マッチング（推奨）
non_greedy = r'<div>.*?</div>'

pythonhtml = "<div>内容1</div><div>内容2</div>"

python# 貪欲マッチングは全体を取得
print(f"貪欲: {re.findall(greedy, html)}")
# ['<div>内容1</div><div>内容2</div>']

python# 非貪欲マッチングは個別に取得
print(f"非貪欲: {re.findall(non_greedy, html)}")
# ['<div>内容1</div>', '<div>内容2</div>']

デバッグとテストのベストプラクティス

正規表現のデバッグに役立つテクニックをご紹介します。

verbose モードでの可読性向上

python# 複雑なパターンをコメント付きで記述
email_pattern = re.compile(r'''
    ^                      # 文字列の先頭
    [\w\.-]+               # ユーザー名部分
    @                      # @ 記号
    [\w\.-]+               # ドメイン名
    \.                     # ドット
    \w+                    # トップレベルドメイン
    $                      # 文字列の末尾
''', re.VERBOSE)

pythontest_email = "user@example.com"
if email_pattern.match(test_email):
    print(f"'{test_email}' は有効なメールアドレスです")

re.VERBOSE フラグを使うと、パターン内に空白やコメントを含められます。

マッチ結果の詳細確認

pythonpattern = r'(\w+)@(\w+)\.(\w+)'
text = "contact@example.com"

pythonmatch = re.search(pattern, text)
if match:
    print(f"全体のマッチ: {match.group(0)}")
    print(f"グループ1（ユーザー名）: {match.group(1)}")
    print(f"グループ2（ドメイン）: {match.group(2)}")
    print(f"グループ3（TLD）: {match.group(3)}")
    print(f"マッチ位置: {match.start()}-{match.end()}")
    print(f"全グループ: {match.groups()}")

まとめ

この記事では、Python の正規表現について 50 個の実用的なパターンを体系的にご紹介しました。

習得した主要ポイント

基本マッチング: 完全一致、部分一致、前方一致、後方一致などの基本操作
文字クラスと量指定子: \d、\w、\s や *、+、{n,m} などの繰り返し制御
グループ化: キャプチャグループ、非キャプチャグループ、名前付きグループの活用
高度なテクニック: 先読み・後読み、境界マッチング、複合パターン
実用パターン: メールアドレス、URL、電話番号、日付、IP アドレスなどの検証
パフォーマンス: コンパイル済みパターンの再利用、非貪欲マッチングの活用
デバッグ: verbose モード、詳細なマッチ情報の確認方法

図で理解する正規表現活用の全体像

mermaidflowchart TD
  start["正規表現の学習"] --> basic["基本パターン<br/>完全一致・部分一致"]
  basic --> chars["文字クラス<br/>\d \w \s など"]
  chars --> quant["量指定子<br/>+ * ? {n,m}"]
  quant --> group["グループ化<br/>() (?:) (?P&lt;name&gt;)"]
  group --> advanced["高度な技法<br/>先読み・後読み"]
  advanced --> practical["実用パターン<br/>メール・URL・電話番号"]
  practical --> optimize["最適化<br/>compile()・非貪欲"]
  optimize --> master["正規表現マスター"]