Regular Expressions in Python re Module

Regular in Python is best learned by connecting the rule to an automation script. Start with the smallest function or script, observe the output, and then add one realistic constraint so the concept becomes practical.

The key habit for this lesson is to watch input value and returned object as it changes. That makes the topic easier to debug, easier to explain in interviews, and easier to use in real code without memorizing isolated syntax.

What are Regular Expressions?

A regular expression (regex) is a pattern used to match, search, and manipulate text. Python's re module provides full regex support.

re Module Functions

Function	Description
re.match(pattern, string)	Match at the beginning of string
re.search(pattern, string)	Search anywhere in string
re.findall(pattern, string)	Return all matches as a list
re.finditer(pattern, string)	Return iterator of match objects
re.sub(pattern, repl, string)	Replace matches with repl
re.split(pattern, string)	Split string by pattern
re.compile(pattern)	Compile pattern for reuse

Basic Functions

import re

text = "The price is $25.99 and $10.50"

# search - find first match anywhere
match = re.search(r"\d+\.\d+", text)
if match:
    print(match.group())   # 25.99
    print(match.start())   # 14 (start index)
    print(match.end())     # 19 (end index)

# findall - find all matches
prices = re.findall(r"\$\d+\.\d+", text)
print(prices)   # ['$25.99', '$10.50']

# sub - replace matches
clean = re.sub(r"\$\d+\.\d+", "[PRICE]", text)
print(clean)    # The price is [PRICE] and [PRICE]

# split - split by pattern
sentence = "one,two;three four"
parts = re.split(r"[,; ]+", sentence)
print(parts)    # ['one', 'two', 'three', 'four']

Regex Pattern Syntax

Pattern	Matches	Example
.	Any character (except newline)	a.c -> "abc", "a1c"
^	Start of string	^Hello
$	End of string	world$
*	0 or more	ab* -> "a", "ab", "abb"
+	1 or more	ab+ -> "ab", "abb"
?	0 or 1 (optional)	colou?r -> "color", "colour"
{n}	Exactly n times	\d{4} -> "2024"
{n,m}	Between n and m times	\d{2,4}
[abc]	Any of a, b, c	[aeiou]
[^abc]	Not a, b, or c	[^0-9]
\d	Digit [0-9]	\d+ -> "123"
\D	Non-digit
\w	Word char [a-zA-Z0-9_]	\w+
\W	Non-word char
\s	Whitespace	\s+
\S	Non-whitespace
\b	Word boundary	\bword\b
(abc)	Capture group	(\d+)-(\d+)
a\|b	a or b	cat\|dog

Groups and Capturing

Groups

import re

# Capture groups with ()
date_str = "Today is 2024-06-15"
match = re.search(r"(\d{4})-(\d{2})-(\d{2})", date_str)
if match:
    print(match.group(0))  # 2024-06-15 (full match)
    print(match.group(1))  # 2024 (year)
    print(match.group(2))  # 06   (month)
    print(match.group(3))  # 15   (day)

# Named groups
match = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", date_str)
if match:
    print(match.group("year"))   # 2024
    print(match.group("month"))  # 06
    print(match.groupdict())     # {'year': '2024', 'month': '06', 'day': '15'}

# findall with groups returns list of tuples
text = "John: 25, Alice: 30, Bob: 22"
results = re.findall(r"(\w+): (\d+)", text)
print(results)  # [('John', '25'), ('Alice', '30'), ('Bob', '22')]

Practical Examples

Real-World Patterns

import re

# Email validation
def is_valid_email(email: str) -> bool:
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

print(is_valid_email("user@example.com"))   # True
print(is_valid_email("invalid-email"))      # False

# Phone number extraction
text = "Call us at 555-123-4567 or (800) 555-0199"
phones = re.findall(r"[\d\-\(\) ]{10,}", text)
print(phones)

# URL extraction
html = '<a href="https://example.com">Link</a> and <a href="http://test.org">Test</a>'
urls = re.findall(r'https?://[^\s"]+', html)
print(urls)   # ['https://example.com', 'http://test.org']

# Password strength check
def check_password(pwd: str) -> dict:
    return {
        "length": len(pwd) >= 8,
        "uppercase": bool(re.search(r"[A-Z]", pwd)),
        "lowercase": bool(re.search(r"[a-z]", pwd)),
        "digit": bool(re.search(r"\d", pwd)),
        "special": bool(re.search(r"[!@#$%^&*]", pwd)),
    }

result = check_password("MyPass123!")
print(result)

# Compile for reuse (faster when used many times)
email_re = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
emails = ["a@b.com", "bad", "x@y.org"]
valid = [e for e in emails if email_re.match(e)]
print(valid)  # ['a@b.com', 'x@y.org']

Applied guide for Regular

Use Regular when the program needs a clear answer to a specific problem, not because the keyword looks familiar. In a real Python task, first name the input, then name the transformation, then name the output. This small discipline shows whether the topic is being used correctly or only copied from an example.

A reliable practice flow is: create the smallest working function or script, add one normal case, add one edge case such as empty matches, greedy groups, and escaped characters, and then confirm the result with traceback and printed inspection. If the result surprises you, reduce the code until the behavior is visible again.

The most common trap here is writing a pattern that matches too much. Avoid it by writing one sentence before the code that explains why Regular is the right choice. After the code runs, verify the lesson by doing this: test the pattern against matching and non-matching strings.

Identify the exact problem solved by Regular.
Trace input value and returned object before and after the main operation.
Keep one intentionally broken version and explain the fix.
Connect the example to an automation script so the idea feels concrete.

Key Takeaways

I can explain where Regular fits inside an automation script.
I can point to the exact input value and returned object affected by this topic.
I tested a normal case and an edge case involving empty matches, greedy groups, and escaped characters.
I verified the result with traceback and printed inspection instead of assuming it worked.
I can describe the main mistake: writing a pattern that matches too much.

Common Mistakes to Avoid

WRONG Writing a pattern that matches too much.

RIGHT Write the expected behavior first, then make the example prove it.

A one-line expectation turns the code from copied syntax into a testable idea.

WRONG Practicing only the perfect input.

RIGHT Also test empty matches, greedy groups, and escaped characters before considering the lesson complete.

The edge case is where most interview follow-up questions begin.

WRONG Looking only at the final output.

RIGHT Trace input value and returned object through each important step.

Tracing makes debugging faster because you can see the first incorrect state.

Practice Tasks

Build one small function or script that demonstrates Regular in an automation script.
Change the example to include empty matches, greedy groups, and escaped characters and record the difference.
Break the example by deliberately writing a pattern that matches too much, then write the corrected version.
Explain the finished example in five bullet points: input, operation, output, failure case, and verification.

Frequently Asked Questions

When should I use Regular?

Use it when the problem matches the behavior shown in the example and when the result can be verified through traceback and printed inspection.

How do I avoid mistakes in Regular?

Start with a tiny case, then test empty matches, greedy groups, and escaped characters. The main warning sign is writing a pattern that matches too much.

How can I revise Regular quickly?

Trace input value and returned object, predict the result, run the example, and compare your prediction with the actual output.

Previous Next

Regular Expressions in Python re Module