Regular in Python is best learned by connecting the rule to an automation script. Start with the smallest function or script, observe the output, and then add one realistic constraint so the concept becomes practical.
The key habit for this lesson is to watch input value and returned object as it changes. That makes the topic easier to debug, easier to explain in interviews, and easier to use in real code without memorizing isolated syntax.
A regular expression (regex) is a pattern used to match, search, and manipulate text. Python's re module provides full regex support.
| Function | Description |
|---|---|
| re.match(pattern, string) | Match at the beginning of string |
| re.search(pattern, string) | Search anywhere in string |
| re.findall(pattern, string) | Return all matches as a list |
| re.finditer(pattern, string) | Return iterator of match objects |
| re.sub(pattern, repl, string) | Replace matches with repl |
| re.split(pattern, string) | Split string by pattern |
| re.compile(pattern) | Compile pattern for reuse |
import re
text = "The price is $25.99 and $10.50"
# search - find first match anywhere
match = re.search(r"\d+\.\d+", text)
if match:
print(match.group()) # 25.99
print(match.start()) # 14 (start index)
print(match.end()) # 19 (end index)
# findall - find all matches
prices = re.findall(r"\$\d+\.\d+", text)
print(prices) # ['$25.99', '$10.50']
# sub - replace matches
clean = re.sub(r"\$\d+\.\d+", "[PRICE]", text)
print(clean) # The price is [PRICE] and [PRICE]
# split - split by pattern
sentence = "one,two;three four"
parts = re.split(r"[,; ]+", sentence)
print(parts) # ['one', 'two', 'three', 'four']
| Pattern | Matches | Example |
|---|---|---|
| . | Any character (except newline) | a.c -> "abc", "a1c" |
| ^ | Start of string | ^Hello |
| $ | End of string | world$ |
| * | 0 or more | ab* -> "a", "ab", "abb" |
| + | 1 or more | ab+ -> "ab", "abb" |
| ? | 0 or 1 (optional) | colou?r -> "color", "colour" |
| {n} | Exactly n times | \d{4} -> "2024" |
| {n,m} | Between n and m times | \d{2,4} |
| [abc] | Any of a, b, c | [aeiou] |
| [^abc] | Not a, b, or c | [^0-9] |
| \d | Digit [0-9] | \d+ -> "123" |
| \D | Non-digit | |
| \w | Word char [a-zA-Z0-9_] | \w+ |
| \W | Non-word char | |
| \s | Whitespace | \s+ |
| \S | Non-whitespace | |
| \b | Word boundary | \bword\b |
| (abc) | Capture group | (\d+)-(\d+) |
| a|b | a or b | cat|dog |
import re
# Capture groups with ()
date_str = "Today is 2024-06-15"
match = re.search(r"(\d{4})-(\d{2})-(\d{2})", date_str)
if match:
print(match.group(0)) # 2024-06-15 (full match)
print(match.group(1)) # 2024 (year)
print(match.group(2)) # 06 (month)
print(match.group(3)) # 15 (day)
# Named groups
match = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", date_str)
if match:
print(match.group("year")) # 2024
print(match.group("month")) # 06
print(match.groupdict()) # {'year': '2024', 'month': '06', 'day': '15'}
# findall with groups returns list of tuples
text = "John: 25, Alice: 30, Bob: 22"
results = re.findall(r"(\w+): (\d+)", text)
print(results) # [('John', '25'), ('Alice', '30'), ('Bob', '22')]
import re
# Email validation
def is_valid_email(email: str) -> bool:
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
return bool(re.match(pattern, email))
print(is_valid_email("user@example.com")) # True
print(is_valid_email("invalid-email")) # False
# Phone number extraction
text = "Call us at 555-123-4567 or (800) 555-0199"
phones = re.findall(r"[\d\-\(\) ]{10,}", text)
print(phones)
# URL extraction
html = '<a href="https://example.com">Link</a> and <a href="http://test.org">Test</a>'
urls = re.findall(r'https?://[^\s"]+', html)
print(urls) # ['https://example.com', 'http://test.org']
# Password strength check
def check_password(pwd: str) -> dict:
return {
"length": len(pwd) >= 8,
"uppercase": bool(re.search(r"[A-Z]", pwd)),
"lowercase": bool(re.search(r"[a-z]", pwd)),
"digit": bool(re.search(r"\d", pwd)),
"special": bool(re.search(r"[!@#$%^&*]", pwd)),
}
result = check_password("MyPass123!")
print(result)
# Compile for reuse (faster when used many times)
email_re = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
emails = ["a@b.com", "bad", "x@y.org"]
valid = [e for e in emails if email_re.match(e)]
print(valid) # ['a@b.com', 'x@y.org']
Use Regular when the program needs a clear answer to a specific problem, not because the keyword looks familiar. In a real Python task, first name the input, then name the transformation, then name the output. This small discipline shows whether the topic is being used correctly or only copied from an example.
A reliable practice flow is: create the smallest working function or script, add one normal case, add one edge case such as empty matches, greedy groups, and escaped characters, and then confirm the result with traceback and printed inspection. If the result surprises you, reduce the code until the behavior is visible again.
The most common trap here is writing a pattern that matches too much. Avoid it by writing one sentence before the code that explains why Regular is the right choice. After the code runs, verify the lesson by doing this: test the pattern against matching and non-matching strings.
Writing a pattern that matches too much.
Write the expected behavior first, then make the example prove it.
Practicing only the perfect input.
Also test empty matches, greedy groups, and escaped characters before considering the lesson complete.
Looking only at the final output.
Trace input value and returned object through each important step.
Use it when the problem matches the behavior shown in the example and when the result can be verified through traceback and printed inspection.
Start with a tiny case, then test empty matches, greedy groups, and escaped characters. The main warning sign is writing a pattern that matches too much.
Trace input value and returned object, predict the result, run the example, and compare your prediction with the actual output.
Explore 500+ free tutorials across 20+ languages and frameworks.