JavaScript Regular Expressions: Patterns · Flags · Characters · Meta Characters · Assertions · Quantifiers · Groups · RegExp Object · Methods
📑 Table of Contents
- Background: What Are Regular Expressions?
- Topic 1 — Creating Regular Expressions
- Topic 2 — Flags (Modifiers)
- Topic 3 — Character Classes & Ranges
- Topic 4 — Meta Characters
- Topic 5 — Assertions (Anchors & Lookarounds)
- Topic 6 — Quantifiers
- Topic 7 — Groups & Patterns
- Topic 8 — The RegExp Object
- Topic 9 — RegExp & String Methods
- Applied Exercises
- Mini Project — Form Validator & Text Processor
- Completion Checklist
1. Background: What Are Regular Expressions?
Imagine you receive a form submission with thousands of email addresses and need to check which ones are valid. Or you have a document full of phone numbers in inconsistent formats — (080) 123-4567, 080-123-4567, 08012345678 — and need to extract and standardise them all.
Doing this with normal string methods (indexOf, includes, slice) would require dozens of lines of complex logic. A regular expression (RegExp or regex) solves it in a single pattern.
// ❌ Without regex — checking one simple email condition takes many lines
function roughEmailCheck(email) {
const hasAt = email.includes("@");
const hasDot = email.includes(".");
const atIndex = email.indexOf("@");
const dotAfter = email.lastIndexOf(".") > atIndex;
return hasAt && hasDot && dotAfter;
}
// ✅ With regex — a full email pattern in one expression
const emailPattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
console.log(emailPattern.test("alice@example.com")); // true
console.log(emailPattern.test("not-an-email")); // false
A regular expression is a sequence of characters that defines a search pattern. It describes what to look for in a string — not by spelling out exact text, but by specifying a pattern of characters.
What Can You Do with Regular Expressions?
| Task | Example |
|---|---|
| Test if a string matches a pattern | Is this a valid email? |
| Find all matches in a string | Find every phone number in a document |
| Extract specific parts of a match | Get the area code from each phone number |
| Replace matched text | Censor words, reformat dates |
| Split a string on a pattern | Split CSV respecting quoted commas |
| Validate user input | Forms, APIs, data pipelines |
🏢 REAL WORLD: Regex is used by every major programming language and tool. In professional JavaScript, it appears in form validation, search engines, URL routers, log parsers, code editors (search & replace), data cleaning pipelines, and security filters (blocking SQL injection patterns).
Anatomy of a Regular Expression
/pattern/flags
/ \d{3}-\d{4} / g
↑ ↑ ↑ ↑
| pattern | flag(s)
delimiter delimiter
- The delimiters (
/) mark the start and end — like quotes for strings - The pattern is what to search for
- The flags (optional) modify how the search works (e.g. case-insensitive, global)
2. Topic 1 — Creating Regular Expressions
Phase 1 — Conceptual Understanding
There are two ways to create a regex in JavaScript. Understanding the difference is important for choosing the right one.
Method 1 — Literal Syntax (Preferred)
Use forward slashes to wrap the pattern. The pattern is compiled at load time — faster and cleaner for fixed patterns.
const pattern = /hello/;
console.log(pattern.test("say hello world")); // true
console.log(pattern.test("say goodbye")); // false
💡 TIP: Always prefer literal syntax unless the pattern is built dynamically at runtime. Literals are faster, cleaner, and checked for syntax errors when the script loads.
Method 2 — new RegExp() Constructor
Passes the pattern as a string. The pattern is compiled at runtime — required when the pattern is built from variables.
// Static pattern — same as /hello/
const pattern1 = new RegExp("hello");
// Dynamic pattern — built from a variable
const word = "world";
const pattern2 = new RegExp(word);
console.log(pattern1.test("say hello")); // true
console.log(pattern2.test("hello world")); // true
⚠️ WATCH OUT — Escaping in Constructor Strings: In a regex literal,
\dmeans “a digit”. In a constructor string, the backslash must be doubled because\din a JavaScript string is justd(unrecognised escape).// These are EQUIVALENT: const a = /\d+/; // literal — \d is the regex metachar const b = new RegExp("\\d+"); // string — \\ is needed to produce \d console.log(a.test("abc123")); // true console.log(b.test("abc123")); // true
Adding Flags at Creation
// Literal — flags after the closing slash
const lit = /hello/gi;
// Constructor — flags as second argument
const con = new RegExp("hello", "gi");
// Both are identical — case-insensitive, global match
console.log("Hello World, hello!".match(lit)); // ["Hello", "hello"]
console.log("Hello World, hello!".match(con)); // ["Hello", "hello"]
3. Topic 2 — Flags (Modifiers)
Phase 1 — Conceptual Understanding
Flags change how the entire regex engine behaves. They are placed after the closing / in a literal, or as the second argument to new RegExp().
i — Case-Insensitive
Makes the entire match ignore uppercase vs lowercase.
const pattern = /hello/i;
console.log(pattern.test("Hello")); // true
console.log(pattern.test("HELLO")); // true
console.log(pattern.test("hElLo")); // true
console.log(pattern.test("goodbye")); // false
Without vs with i:
console.log(/Cat/.test("I have a cat")); // false — uppercase C doesn't match lowercase c
console.log(/Cat/i.test("I have a cat")); // true — i flag ignores case
g — Global (Find ALL Matches)
Without g, methods like match() return only the first match. With g, they return all matches.
const text = "cat and Cat and CAT";
// Without g — only first match
console.log(text.match(/cat/i)); // ["cat"] (just first)
// With gi — ALL matches
console.log(text.match(/cat/gi)); // ["cat", "Cat", "CAT"]
⚠️ WATCH OUT —
gflag andtest(): When you usegwith a regex stored in a variable and call.test()repeatedly, the regex remembers its last position (lastIndex). This causes alternating true/false results in a loop!const re = /cat/g; console.log(re.test("cat")); // true (lastIndex → 3) console.log(re.test("cat")); // false (search starts at 3, past end!) console.log(re.test("cat")); // true (lastIndex reset to 0) // Fix: use .test() without g, or reset lastIndex manually re.lastIndex = 0; // reset before reuse
m — Multiline
Changes the meaning of ^ (start) and $ (end) anchors. Without m, ^ and $ match the start/end of the entire string. With m, they match the start/end of each line.
const text = "first line\nsecond line\nthird line";
// Without m — ^ only matches start of entire string
console.log(text.match(/^\w+/)); // ["first"]
// With m — ^ matches start of EACH line
console.log(text.match(/^\w+/gm)); // ["first", "second", "third"]
s — Dot-All (Single-Line)
By default, the . metacharacter matches any character except newlines. The s flag makes . match newlines too.
const text = "Hello\nWorld";
console.log(/Hello.World/.test(text)); // false — . doesn't match \n
console.log(/Hello.World/s.test(text)); // true — s flag makes . match \n
d — Generate Indices
Makes exec() and match() include indices — the start/end positions of each match and capture group.
const match = /(\w+)/.exec("hello world");
console.log(match.indices); // undefined — no d flag
const matchD = /(\w+)/d.exec("hello world");
console.log(matchD.indices); // [[0,5],[0,5]] — [full match range, group 1 range]
u — Unicode Mode
Enables full Unicode support, including matching Unicode code points above U+FFFF and making \p{...} Unicode property escapes available.
// Without u — \uD83D\uDC4D is a surrogate pair (emoji 👍)
console.log(/^.$/.test("👍")); // false — emoji is 2 "characters" without u
console.log(/^.$/u.test("👍")); // true — u mode treats it as 1 code point
// Unicode property escapes (requires u flag)
console.log(/\p{Letter}/u.test("A")); // true
console.log(/\p{Emoji}/u.test("👍")); // true
v — Unicode Sets Mode (ES2024)
An upgrade to u mode — enables set operations inside character classes ([A--B], [A&&B]), Unicode property strings, and improved Unicode handling.
// Intersection — match characters that are both letters AND ASCII
const asciiLetter = /[\p{Letter}&&\p{ASCII}]/v;
console.log(asciiLetter.test("A")); // true
console.log(asciiLetter.test("é")); // false — not ASCII
y — Sticky
The match must start at exactly lastIndex — it does not search forward. Used for incremental parsing.
const re = /\d+/y;
re.lastIndex = 4;
console.log(re.exec("abc 123 456")); // ["123"] — found at exactly index 4
re.lastIndex = 0;
console.log(re.exec("abc 123 456")); // null — position 0 is 'a', not a digit
Flags Summary Table
| Flag | Name | Effect |
|---|---|---|
i |
Case-insensitive | Ignore uppercase/lowercase |
g |
Global | Find ALL matches (not just first) |
m |
Multiline | ^ and $ match each line |
s |
Dot-all | . matches newlines too |
d |
Indices | Include start/end positions of matches |
u |
Unicode | Full Unicode support + \p{} escapes |
v |
Unicode sets | Set operations in [], Unicode strings |
y |
Sticky | Match only at exact lastIndex position |
Combining Multiple Flags
Flags can be combined in any order:
// Case-insensitive AND global AND multiline
const pattern = /^hello/gim;
const text = "Hello World\nhello there\nHELLO!";
console.log(text.match(pattern)); // ["Hello", "hello", "HELLO"]
4. Topic 3 — Character Classes & Ranges
Phase 1 — Conceptual Understanding
A character class (also called a character set) lets you match any one character from a defined group. They are written inside square brackets [...].
[abc] — Match Any Listed Character
Matches ONE character that is a, b, or c — any single character in the list.
// [aeiou] matches any single vowel
const vowel = /[aeiou]/;
console.log(vowel.test("apple")); // true (contains 'a')
console.log(vowel.test("sky")); // false (no vowels)
console.log(vowel.test("gym")); // false ('y' not in class)
Find ALL vowels with g:
const text = "Hello World";
console.log(text.match(/[aeiou]/gi)); // ["e", "o", "o"]
▶ Expected Output: ["e", "o", "o"]
[^abc] — Negated Character Class — Match Anything EXCEPT Listed
The ^ inside [...] negates the class — matches any character NOT in the list.
// [^aeiou] matches any NON-vowel character
console.log("Hello".match(/[^aeiou]/gi)); // ["H", "l", "l"]
// [^0-9] matches any non-digit character
console.log("a1b2c3".match(/[^0-9]/g)); // ["a", "b", "c"]
⚠️ WATCH OUT:
^inside[...]means negation.^outside[...]means start-of-string anchor. Two completely different meanings!/^[abc]/ // Starts with a, b, or c /[^abc]/ // Any character EXCEPT a, b, c
[a-z] — Character Range
A hyphen - inside [...] defines a range. Matches any character between the two endpoints (inclusive), based on Unicode code points.
// Lowercase letters
console.log(/[a-z]/.test("hello")); // true
console.log(/[a-z]/.test("HELLO")); // false — uppercase
// Uppercase letters
console.log(/[A-Z]/.test("Hello")); // true
// Digits
console.log(/[0-9]/.test("abc5")); // true
console.log(/[0-9]/.test("abcde")); // false
// Letters AND digits combined
console.log(/[a-zA-Z0-9]/.test("a1")); // true
Practical ranges:
// Extract only letters from a messy string
const messy = "H3ll0 W0rld!";
console.log(messy.match(/[a-zA-Z]/g)); // ["H", "l", "l", "W", "r", "l", "d"]
// Remove non-digit characters from phone number
const phone = "(080) 123-4567";
console.log(phone.replace(/[^0-9]/g, "")); // "08012345678"
▶ Expected Output:
["H", "l", "l", "W", "r", "l", "d"]
08012345678
Combining Ranges and Characters
You can mix ranges and individual characters inside one class:
// Alphanumeric + underscore + hyphen
const slug = /^[a-zA-Z0-9_-]+$/;
console.log(slug.test("my-blog-post")); // true
console.log(slug.test("my blog post")); // false (space not allowed)
console.log(slug.test("my_blog_2024")); // true
console.log(slug.test("my@blog!post")); // false (@ and ! not allowed)
💡 TIP: The hyphen
-is special inside[...]. To include a literal hyphen, place it at the start or end of the class, or escape it with\-:/[a-z-]/ // letters a-z OR a literal hyphen (hyphen at end = literal) /[-a-z]/ // same — hyphen at start = literal /[a\-z]/ // escaped — also literal hyphen
Common Character Class Patterns
// Hex digit (0-9 and a-f and A-F)
const hex = /[0-9a-fA-F]/;
console.log(hex.test("F")); // true
console.log(hex.test("G")); // false
// Filename character (no slashes, colons)
const filename = /^[^/\\:*?"<>|]+$/;
console.log(filename.test("report_2024.pdf")); // true
console.log(filename.test("file/path")); // false
// Valid username: letters, digits, underscore, 3-20 chars
const username = /^[a-zA-Z0-9_]{3,20}$/;
console.log(username.test("alice_99")); // true
console.log(username.test("a")); // false (too short)
console.log(username.test("alice-99")); // false (hyphen not allowed)
5. Topic 4 — Meta Characters
Phase 1 — Conceptual Understanding
Meta characters are special characters in regex that have a meaning beyond their literal value. They are the core building blocks of every pattern.
Shorthand Character Classes
These are the most frequently used meta characters — each represents a common character class:
\d — Any Digit [0-9]
console.log(/\d/.test("abc5def")); // true — contains a digit
console.log(/\d/.test("abcdef")); // false — no digits
// Find all digits in a string
console.log("a1b22c333".match(/\d+/g)); // ["1", "22", "333"]
\D — Any NON-Digit [^0-9]
console.log("a1b22c333".match(/\D+/g)); // ["a", "b", "c"]
\w — Word Character [a-zA-Z0-9_]
Matches letters, digits, and underscore — the characters valid in most identifiers.
console.log(/\w/.test("hello")); // true
console.log(/\w/.test("!@#$")); // false — no word characters
// Extract words
console.log("Hello, World! 2024".match(/\w+/g)); // ["Hello", "World", "2024"]
\W — Any NON-Word Character [^a-zA-Z0-9_]
// Find all punctuation/spaces
console.log("Hello, World!".match(/\W+/g)); // [",", " ", "!"]
\s — Any Whitespace [ \t\n\r\f\v]
Matches space, tab, newline, carriage return, form feed, vertical tab.
console.log(/\s/.test("hello world")); // true — has a space
console.log(/\s/.test("helloworld")); // false — no whitespace
// Normalise multiple spaces to one
console.log("too many spaces".replace(/\s+/g, " ")); // "too many spaces"
\S — Any NON-Whitespace
// Extract non-space "tokens"
console.log(" hello world ".match(/\S+/g)); // ["hello", "world"]
The Dot . — Any Character (Except Newline)
. matches any single character except a newline \n (unless s flag is used).
console.log(/./.test("a")); // true
console.log(/./.test("5")); // true
console.log(/./.test("!")); // true
console.log(/./.test("\n")); // false — newline excluded
console.log(/./s.test("\n")); // true — s flag includes newline
// Pattern: exactly 3 characters starting with 'c'
console.log(/^c..$/. test("cat")); // true
console.log(/^c..$/. test("cart")); // false (4 chars)
🐛 COMMON MISTAKE: Using
.when you mean “any character” often creates overly broad patterns that match too much. Prefer[^\n]or specific character classes when you know what you expect. Also, to match a literal dot (e.g., inexample.com), you must escape it:\.
// ❌ Wrong — dot matches ANY character including commas, slashes, etc.
/www.example.com/ // also matches "wwwXexampleYcom"!
// ✅ Correct — escape the dots to match literal periods
/www\.example\.com/
Escape Character \
Backslash \ turns a special character into a literal one.
// Match a literal dot
console.log(/3\.14/.test("3.14")); // true
console.log(/3\.14/.test("3X14")); // false
// Match a literal dollar sign
console.log(/\$100/.test("$100")); // true
console.log(/\$100/.test("100")); // false
// Match a literal backslash (needs \\)
console.log(/C:\\Users/.test("C:\\Users")); // true
Characters that MUST be escaped to match literally:
. * + ? ^ $ { } [ ] | ( ) \
// Match a literal question mark
console.log(/Are you sure\?/.test("Are you sure?")); // true
// Match a literal parenthesis
console.log(/\(555\)/.test("(555)")); // true
Alternation | — OR
The pipe | matches either the pattern on the left OR the pattern on the right. It is like an “or” for entire sub-expressions.
// Match "cat" OR "dog"
const petPattern = /cat|dog/;
console.log(petPattern.test("I have a cat")); // true
console.log(petPattern.test("I have a dog")); // true
console.log(petPattern.test("I have a bird")); // false
// Find all occurrences
const text = "cat and dog and cat";
console.log(text.match(/cat|dog/g)); // ["cat", "dog", "cat"]
💡 TIP: Alternation has the lowest precedence in regex. Use groups
(...)to limit its scope:/gray|grey/ // "gray" OR "grey" ← correct /gr(a|e)y/ // "gr" then ("a" OR "e") then "y" ← also correct, different structure /I love cat|dog/ // "I love cat" OR "dog" ← probably not what you wanted! /I love (cat|dog)/ // "I love cat" OR "I love dog" ← correct
Meta Characters Reference Table
| Meta Char | Meaning | Example | Matches | |
|---|---|---|---|---|
. |
Any char (not newline) | a.b |
“aXb”, “a1b” | |
\d |
Digit [0-9] |
\d\d |
“42”, “09” | |
\D |
Non-digit | \D+ |
“abc”, “!!” | |
\w |
Word char [a-zA-Z0-9_] |
\w+ |
“hello_2” | |
\W |
Non-word char | \W+ |
”!@#”, “, “ | |
\s |
Whitespace | \s+ |
” “, “\t\n” | |
\S |
Non-whitespace | \S+ |
“hello”, “42” | |
\b |
Word boundary | \bcat\b |
“cat” not “cats” | |
\B |
Non-word boundary | \Bcat\B |
“tomcat” | |
\n |
Newline literal | \n |
newline char | |
\t |
Tab literal | \t |
tab char | |
\r |
Carriage return | \r |
CR char | |
\0 |
Null character | \0 |
null | |
\uXXXX |
Unicode code point | \u0041 |
“A” | |
\xHH |
Hex character | \x41 |
“A” | |
\\ |
Literal backslash | C:\\ |
“C:" | |
\| |
Literal pipe | a\|b |
“a | b” |
6. Topic 5 — Assertions (Anchors & Lookarounds)
Phase 1 — Conceptual Understanding
Assertions are zero-width matches — they describe a position in the string rather than consuming characters. They are like “conditions” that must be true at a point in the string for the overall match to succeed.
^ — Start of String (or Line with m)
The pattern must match at the very beginning of the string.
// String must START with "Hello"
console.log(/^Hello/.test("Hello World")); // true
console.log(/^Hello/.test("Say Hello")); // false — "Hello" not at start
// With m flag — start of each LINE
const text = "apple\nbanana\napricot";
console.log(text.match(/^a\w+/gm)); // ["apple", "apricot"]
$ — End of String (or Line with m)
The pattern must match at the very end of the string.
// String must END with ".com"
console.log(/\.com$/.test("example.com")); // true
console.log(/\.com$/.test("example.com/path")); // false
// Validate format: digits only
console.log(/^\d+$/.test("12345")); // true — all digits
console.log(/^\d+$/.test("123a5")); // false — 'a' breaks the pattern
💡 TIP:
^pattern$(anchoring both ends) is the cornerstone of validation patterns — it ensures the entire string matches, with nothing extra before or after.
\b — Word Boundary
Matches a position between a word character (\w) and a non-word character (\W) — the invisible boundary at the edge of a word.
// Match the word "cat" but not "catch", "cats", "tomcat"
const pattern = /\bcat\b/;
console.log(pattern.test("my cat sat")); // true — "cat" as a whole word
console.log(pattern.test("my cats")); // false — "cat" followed by "s"
console.log(pattern.test("tomcat")); // false — "cat" preceded by "tom"
console.log(pattern.test("the cat.")); // true — dot is non-word boundary
🏢 REAL WORLD:
\bis essential for whole-word search — find the word “apple” but not “pineapple” or “apples”. Every modern text editor’s “Find Whole Word Only” feature uses\binternally.
Extract whole words only:
const text = "category: cats and catfish and a cat";
console.log(text.match(/\bcat\b/g)); // ["cat"] — only the standalone word
\B — Non-Word Boundary
Matches a position that is NOT a word boundary — inside a word.
// Match "cat" only when it's INSIDE a larger word
const pattern = /\Bcat\B/;
console.log(pattern.test("tomcats")); // true — "cat" inside "tomcats"
console.log(pattern.test("my cat")); // false — "cat" is a whole word here
Lookahead (?=...) — “Followed By”
A positive lookahead asserts that what follows the current position matches the given pattern — without consuming those characters.
// Match "100" only if followed by " dollars"
const pattern = /\d+(?= dollars)/;
console.log(pattern.exec("I have 100 dollars")); // ["100"]
console.log(pattern.exec("I have 100 euros")); // null — not followed by " dollars"
// The match is just "100" — " dollars" is NOT included in the match
const result = /\d+(?= dollars)/.exec("I have 100 dollars");
console.log(result[0]); // "100" ← only the digits, not " dollars"
▶ Expected Output:
["100"]
null
100
💡 TIP: Lookahead is “peek ahead” — it checks what comes next but doesn’t eat it. Think of it as “match X only when it is followed by Y, but don’t include Y in the match.”
Negative Lookahead (?!...) — “NOT Followed By”
Asserts that what follows does not match the pattern.
// Match "100" only if NOT followed by " euros"
const pattern = /\d+(?! euros)/g;
const text = "100 dollars and 200 euros and 50 pounds";
console.log(text.match(pattern)); // ["100", "50"] — "200" is followed by " euros"
▶ Expected Output: ["100", "50"]
Lookbehind (?<=...) — “Preceded By”
A positive lookbehind asserts that what precedes the current position matches the pattern — without consuming those characters.
// Match digits only if preceded by "$"
const pattern = /(?<=\$)\d+/g;
const text = "Price: $100 and €200 and $350";
console.log(text.match(pattern)); // ["100", "350"] — only dollar amounts
▶ Expected Output: ["100", "350"]
Negative Lookbehind (?<!...) — “NOT Preceded By”
Asserts that what precedes does not match.
// Match digits NOT preceded by "$"
const pattern = /(?<!\$)\d+/g;
const text = "I paid $100 and owed 200 more";
// Careful — this matches parts of numbers. Often combine with \b
const text2 = "I paid $100 and owed 200 more";
console.log(text2.match(/(?<!\$)\b\d+\b/g)); // ["200"]
Lookaround Summary
| Assertion | Syntax | Meaning |
|---|---|---|
| Positive lookahead | X(?=Y) |
Match X followed by Y |
| Negative lookahead | X(?!Y) |
Match X NOT followed by Y |
| Positive lookbehind | (?<=Y)X |
Match X preceded by Y |
| Negative lookbehind | (?<!Y)X |
Match X NOT preceded by Y |
| Start of string | ^ |
Match at string start |
| End of string | $ |
Match at string end |
| Word boundary | \b |
Between word and non-word char |
| Non-word boundary | \B |
Not between word and non-word char |
7. Topic 6 — Quantifiers
Phase 1 — Conceptual Understanding
Quantifiers specify how many times the preceding element must occur. Without quantifiers, every character or group matches exactly once.
* — Zero or More
Matches zero or more repetitions of the preceding element. Even zero — the element can be completely absent.
// "ab*c" — 'a', then zero or more 'b', then 'c'
console.log(/ab*c/.test("ac")); // true — zero b's
console.log(/ab*c/.test("abc")); // true — one b
console.log(/ab*c/.test("abbbbc")); // true — four b's
console.log(/ab*c/.test("aXc")); // false — X is not b
⚠️ WATCH OUT:
*can match nothing./a*/matches any string — even an empty one — because “zero or more a’s” allows empty. This can make patterns match unexpected things.
+ — One or More
Matches one or more repetitions. The element must appear at least once.
// "ab+c" — 'a', then ONE or more 'b', then 'c'
console.log(/ab+c/.test("ac")); // false — no b's (at least 1 required)
console.log(/ab+c/.test("abc")); // true — one b
console.log(/ab+c/.test("abbbbc")); // true — four b's
\d+ — One or more digits — the most common use:
// Find all numbers in text
const text = "Order 1 has 250 items and order 12 has 3 items";
console.log(text.match(/\d+/g)); // ["1", "250", "12", "3"]
? — Zero or One (Optional)
Makes the preceding element optional — it can appear once or not at all.
// "colou?r" — optional 'u' (matches British and American English)
console.log(/colou?r/.test("color")); // true — no u
console.log(/colou?r/.test("colour")); // true — with u
console.log(/colou?r/.test("colouur")); // false — two u's
// Optional extension
console.log(/https?:\/\//.test("http://")); // true
console.log(/https?:\/\//.test("https://")); // true
{n} — Exactly n Times
// Exactly 4 digits
console.log(/^\d{4}$/.test("2024")); // true
console.log(/^\d{4}$/.test("202")); // false (3 digits)
console.log(/^\d{4}$/.test("20245")); // false (5 digits)
// Exactly 3 letters
console.log(/^[A-Z]{3}$/.test("ABC")); // true
console.log(/^[A-Z]{3}$/.test("AB")); // false
{n,} — n or More Times
// 3 or more digits
console.log(/\d{3,}/.test("12")); // false
console.log(/\d{3,}/.test("123")); // true
console.log(/\d{3,}/.test("12345")); // true
{n,m} — Between n and m Times (Inclusive)
// Between 2 and 5 digits
console.log(/^\d{2,5}$/.test("1")); // false (too few)
console.log(/^\d{2,5}$/.test("12")); // true
console.log(/^\d{2,5}$/.test("12345")); // true
console.log(/^\d{2,5}$/.test("123456")); // false (too many)
🏢 REAL WORLD:
{n,m}is the backbone of all validation patterns — password length, username length, phone number formats, postal codes, credit card numbers.
Greedy vs Lazy Quantifiers
By default, quantifiers are greedy — they match as much as possible while still allowing the overall pattern to succeed.
Adding ? after a quantifier makes it lazy — it matches as little as possible.
const html = "<div>Hello</div><div>World</div>";
// Greedy — matches from first < to LAST >
console.log(html.match(/<.+>/));
// ["<div>Hello</div><div>World</div>"] ← too much!
// Lazy — matches from first < to FIRST >
console.log(html.match(/<.+?>/));
// ["<div>"] ← just the first tag
▶ Expected Output:
["<div>Hello</div><div>World</div>"]
["<div>"]
Full lazy quantifier table:
| Greedy | Lazy | Difference |
|---|---|---|
* |
*? |
Zero or more — as few as possible |
+ |
+? |
One or more — as few as possible |
? |
?? |
Zero or one — prefers zero |
{n,m} |
{n,m}? |
n to m — as few as possible |
// Extract content between tags lazily
const text = "<b>Bold</b> and <i>italic</i>";
console.log(text.match(/<[^>]+>[^<]*<\/[^>]+>/g));
// ["<b>Bold</b>", "<i>italic</i>"]
// Or with lazy quantifier
console.log(text.match(/<.+?>.+?<\/.+?>/g));
// ["<b>Bold</b>", "<i>italic</i>"]
Possessive Quantifiers (Atomic Matching)
Adding + after a quantifier (*+, ++, ?+, {n}+) prevents backtracking — once consumed, the engine won’t give characters back. Used for performance and avoiding catastrophic backtracking.
⚠️ NOTE: Possessive quantifiers require JavaScript ES2018+ and are not universally supported in all environments. Check compatibility before use.
Quantifiers Summary
| Quantifier | Meaning |
|---|---|
* |
0 or more (greedy) |
+ |
1 or more (greedy) |
? |
0 or 1 (optional) |
{n} |
Exactly n |
{n,} |
n or more |
{n,m} |
Between n and m |
*? |
0 or more (lazy) |
+? |
1 or more (lazy) |
?? |
0 or 1 (lazy, prefers 0) |
{n,m}? |
n to m (lazy) |
8. Topic 7 — Groups & Patterns
Phase 1 — Conceptual Understanding
Groups let you treat multiple characters as a single unit — applying quantifiers to them, capturing their matched text, or creating non-capturing groupings for alternation.
(...) — Capturing Group
Wraps part of a pattern to:
- Treat multiple characters as one unit (for quantifiers, alternation)
- Capture the matched text so you can extract it
// Without group — + applies only to 'b'
console.log(/ab+/.test("abb")); // true (a then 2 b's)
// With group — + applies to the whole "(ab)"
console.log(/(ab)+/.test("ababab")); // true (ab repeated 3 times)
console.log(/(ab)+/.test("abb")); // false — "abb" doesn't have repeating "ab"
Capturing — extracting matched substrings:
// Capture year, month, day from a date string
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = datePattern.exec("Today is 2024-03-15");
console.log(match[0]); // "2024-03-15" — full match
console.log(match[1]); // "2024" — group 1 (year)
console.log(match[2]); // "03" — group 2 (month)
console.log(match[3]); // "15" — group 3 (day)
▶ Expected Output:
2024-03-15
2024
03
15
Named Capturing Groups (?<name>...)
Instead of accessing captures by index (match[1]), name them for clarity using (?<name>...).
const datePattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = datePattern.exec("Today is 2024-03-15");
console.log(match.groups.year); // "2024"
console.log(match.groups.month); // "03"
console.log(match.groups.day); // "15"
// Also usable in replace with $<name>
const reformatted = "2024-03-15".replace(
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
"$<day>/$<month>/$<year>"
);
console.log(reformatted); // "15/03/2024"
▶ Expected Output:
2024
03
15
15/03/2024
🏢 REAL WORLD: Named groups make complex patterns self-documenting. A date regex with
.groups.yearis far clearer than one with[1]— especially when patterns have many groups.
(?:...) — Non-Capturing Group
Groups characters for quantifiers or alternation WITHOUT capturing the match. More efficient when you don’t need the captured text.
// Capturing group — creates a capture (slower, uses memory)
const a = /(\d{3})-(\d{4})/.exec("555-1234");
console.log(a[1], a[2]); // "555" "1234"
// Non-capturing — groups for structure, no capture
const b = /(?:\d{3})-(?:\d{4})/.exec("555-1234");
console.log(b[1]); // undefined — no captures!
// Non-capturing for alternation
const color = /gr(?:a|e)y/; // matches "gray" or "grey"
console.log(color.test("gray")); // true
console.log(color.test("grey")); // true
💡 TIP: Use
(?:...)whenever you need grouping but don’t need to extract the captured text. It is faster and keeps your capture group numbering clean.
Backreferences \1, \2 — Refer Back to a Captured Group
A backreference matches the same text that was captured by an earlier group. Use \1 for group 1, \2 for group 2, and so on.
// \1 must match the same word as group 1 captured
const doubled = /(\b\w+\b) \1/;
console.log(doubled.test("the the")); // true — "the" repeated
console.log(doubled.test("cat cat")); // true — "cat" repeated
console.log(doubled.test("the cat")); // false — different words
// Find repeated words in text
const text = "I think that that is a problem problem.";
const matches = text.match(/\b(\w+)\b \1\b/g);
console.log(matches); // ["that that", "problem problem"]
▶ Expected Output:
true
true
false
["that that", "problem problem"]
Named backreference \k<name>:
// Match opening and closing HTML tag pairs
const tagPattern = /<(?<tag>\w+)>[^<]*<\/\k<tag>>/;
console.log(tagPattern.test("<div>Hello</div>")); // true
console.log(tagPattern.test("<b>Bold</b>")); // true
console.log(tagPattern.test("<div>Hello</span>")); // false — mismatched tags
Common Validation Patterns
These patterns combine everything covered so far:
// Email (simplified — RFC 5322 is much more complex)
const email = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
console.log(email.test("alice@example.com")); // true
console.log(email.test("alice@.com")); // false
console.log(email.test("aliceexample.com")); // false
// International phone (E.164 format)
const phone = /^\+?[1-9]\d{7,14}$/;
console.log(phone.test("+2348012345678")); // true
console.log(phone.test("08012345678")); // true (no + ok)
console.log(phone.test("123")); // false (too short)
// URL (simplified)
const url = /^https?:\/\/[a-zA-Z0-9.-]+(?:\.[a-zA-Z]{2,})(?:\/[^\s]*)?$/;
console.log(url.test("https://example.com")); // true
console.log(url.test("http://example.com/path?q=1")); // true
console.log(url.test("ftp://example.com")); // false
// Strong password — 8+ chars, uppercase, lowercase, digit, special
const password = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
console.log(password.test("Hello123!")); // true
console.log(password.test("hello123!")); // false (no uppercase)
console.log(password.test("Hello!")); // false (no digit, too short)
// Nigerian BVN (Bank Verification Number) — 11 digits
const bvn = /^\d{11}$/;
console.log(bvn.test("12345678901")); // true
console.log(bvn.test("1234567890")); // false (10 digits)
// Postal code — US ZIP (5 digits or ZIP+4)
const zip = /^\d{5}(?:-\d{4})?$/;
console.log(zip.test("90210")); // true
console.log(zip.test("90210-1234")); // true
console.log(zip.test("9021")); // false
// IPv4 address
const ipv4 = /^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/;
console.log(ipv4.test("192.168.1.1")); // true
console.log(ipv4.test("256.0.0.1")); // false
console.log(ipv4.test("10.0.0.1")); // true
9. Topic 8 — The RegExp Object
Phase 1 — Conceptual Understanding
When you create a regex (either with literal syntax or new RegExp()), you get a RegExp object with properties and methods you can use to inspect and drive the regex engine directly.
RegExp Properties
const re = /hello\d+/gim;
console.log(re.source); // "hello\d+" — the pattern text (no slashes)
console.log(re.flags); // "gim" — all flags as a string (sorted)
console.log(re.global); // true — g flag set?
console.log(re.ignoreCase);// true — i flag set?
console.log(re.multiline); // true — m flag set?
console.log(re.sticky); // false — y flag set?
console.log(re.unicode); // false — u flag set?
console.log(re.dotAll); // false — s flag set?
console.log(re.hasIndices);// false — d flag set?
console.log(re.lastIndex); // 0 — position for next search (g/y only)
▶ Expected Output:
hello\d+
gim
true
true
true
false
false
false
false
0
re.lastIndex — Position Tracking
When using the g or y flag, lastIndex tracks where the next search will start. It updates automatically after each match.
const re = /\d+/g;
const text = "12 and 34 and 56";
let match;
while ((match = re.exec(text)) !== null) {
console.log(`Found "${match[0]}" at index ${match.index}, lastIndex now ${re.lastIndex}`);
}
▶ Expected Output:
Found "12" at index 0, lastIndex now 2
Found "34" at index 7, lastIndex now 9
Found "56" at index 14, lastIndex now 16
Inspecting a RegExp Object
const emailRe = /^[^\s@]+@[^\s@]+\.[^\s@]+$/i;
console.log("Source:", emailRe.source);
console.log("Flags:", emailRe.flags);
console.log("Is global:", emailRe.global);
console.log("Is case-insensitive:", emailRe.ignoreCase);
// Build a new regex from an existing one, adding flags
const withG = new RegExp(emailRe.source, emailRe.flags + "g");
console.log("New flags:", withG.flags);
Modifying lastIndex Manually
const re = /\d+/g;
const text = "100 200 300 400";
re.lastIndex = 8; // skip past the first two numbers
const match = re.exec(text);
console.log(match[0]); // "300" — started searching from index 8
// Always reset before reusing a global regex
re.lastIndex = 0;
console.log(re.exec(text)[0]); // "100" — starts fresh
RegExp.prototype[Symbol.match] — Used Internally by String.match()
Regex objects implement [Symbol.match], [Symbol.replace], [Symbol.search], and [Symbol.split]. This is how String.prototype.match(re) actually calls the regex’s own method.
// These two are identical:
"hello world".match(/\w+/g);
/\w+/g[Symbol.match]("hello world");
// Both: ["hello", "world"]
10. Topic 9 — RegExp & String Methods
Phase 1 — Conceptual Understanding
There are six primary methods for working with regular expressions in JavaScript — four on strings, two on the RegExp object itself.
RegExp.prototype.test(string) — Quick True/False Check
The simplest regex method. Returns true if the pattern matches anywhere in the string, false otherwise. Use it when you just need to know if something matches.
const hasDigit = /\d/;
const isEmail = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
console.log(hasDigit.test("hello5world")); // true
console.log(hasDigit.test("helloworld")); // false
console.log(isEmail.test("alice@example.com")); // true
console.log(isEmail.test("not-an-email")); // false
💡 TIP:
.test()is the fastest method for pure yes/no checks — use it for form validation, filtering, and conditional logic.
RegExp.prototype.exec(string) — Detailed Match Info
Returns a match array with detailed information, or null if no match. More powerful than test() — gives you the matched text, index, and capture groups.
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const result = pattern.exec("Event date: 2024-03-15");
if (result) {
console.log(result[0]); // "2024-03-15" — full match
console.log(result[1]); // "2024" — group 1
console.log(result[2]); // "03" — group 2
console.log(result[3]); // "15" — group 3
console.log(result.index); // 12 — position in string
console.log(result.input); // "Event date: 2024-03-15"
}
▶ Expected Output:
2024-03-15
2024
03
15
12
Event date: 2024-03-15
Iterate ALL matches with exec() in a loop:
const re = /\b\w{5}\b/g; // all 5-letter words
const text = "hello world today is a great Monday";
let match;
while ((match = re.exec(text)) !== null) {
console.log(`"${match[0]}" at index ${match.index}`);
}
▶ Expected Output:
"hello" at index 0
"world" at index 6
"today" at index 12
"great" at index 22
String.prototype.match(regexp) — Get All Matches
Without g flag: behaves like exec() — returns first match with groups.
With g flag: returns an array of all matched strings (no group info).
const text = "Call 080-1234-5678 or 081-9876-5432 for details";
// Without g — first match + groups
const first = text.match(/(\d{3})-(\d{4})-(\d{4})/);
console.log(first[0]); // "080-1234-5678"
console.log(first[1]); // "080"
console.log(first[2]); // "1234"
// With g — ALL matches (no group info)
const all = text.match(/\d{3}-\d{4}-\d{4}/g);
console.log(all); // ["080-1234-5678", "081-9876-5432"]
▶ Expected Output:
080-1234-5678
080
1234
["080-1234-5678", "081-9876-5432"]
⚠️ WATCH OUT: With the
gflag,match()returns an array of strings only — capture groups are ignored. UsematchAll()if you need groups from all matches.
String.prototype.matchAll(regexp) — All Matches WITH Groups
Returns an iterator of all match results — each with full group information. The regex MUST have the g flag.
const text = "2024-01-15 and 2024-03-22 and 2023-12-01";
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
for (const match of text.matchAll(re)) {
const { year, month, day } = match.groups;
console.log(`Year: ${year}, Month: ${month}, Day: ${day}`);
}
▶ Expected Output:
Year: 2024, Month: 01, Day: 15
Year: 2024, Month: 03, Day: 22
Year: 2023, Month: 12, Day: 01
💡 TIP:
matchAll()is the modern replacement forexec()loops. It is cleaner, works with named groups, and returns an iterator you can spread into an array:const matches = [...text.matchAll(re)]; console.log(matches.length); // 3 console.log(matches[0].groups.year); // "2024"
String.prototype.search(regexp) — Find Position of First Match
Returns the index of the first match, or -1 if not found. Like indexOf but for regex patterns.
const text = "Hello World 2024";
console.log(text.search(/\d+/)); // 12 — position of "2024"
console.log(text.search(/xyz/)); // -1 — not found
console.log(text.search(/world/i)); // 6 — case-insensitive
⚠️ WATCH OUT:
search()always searches from the beginning — it ignores thegflag andlastIndex. For finding a position with a literal string,indexOfis faster.search()shines when you need a pattern-based position.
String.prototype.replace(regexp, replacement) — Find and Replace
The most powerful string method. Replaces match(es) with a replacement string or the result of a function.
Simple replacement:
// Replace first match only (no g flag)
console.log("hello hello".replace(/hello/, "goodbye")); // "goodbye hello"
// Replace ALL matches (g flag)
console.log("hello hello".replace(/hello/g, "goodbye")); // "goodbye goodbye"
// Case-insensitive + global
console.log("Hello HELLO hello".replace(/hello/gi, "hi")); // "hi hi hi"
Replacement with capture group references $1, $2, $<name>:
// Reformat date: YYYY-MM-DD → DD/MM/YYYY
const date = "2024-03-15";
const reformatted = date.replace(/(\d{4})-(\d{2})-(\d{2})/, "$3/$2/$1");
console.log(reformatted); // "15/03/2024"
// Using named groups
const reformatted2 = date.replace(
/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/,
"$<d>/$<m>/$<y>"
);
console.log(reformatted2); // "15/03/2024"
▶ Expected Output:
15/03/2024
15/03/2024
Special replacement tokens:
| Token | Meaning |
|---|---|
$$ |
Literal $ |
$& |
Entire matched string |
$` |
Text before match |
$' |
Text after match |
$n |
n-th capture group |
$<name> |
Named capture group |
// Wrap every number in brackets using $&
console.log("I have 5 cats and 3 dogs".replace(/\d+/g, "[$&]"));
// "I have [5] cats and [3] dogs"
Replacement with a function (most powerful):
// Replace numbers with their doubled values
const result = "Price: 50 and discount: 10".replace(/\d+/g, n => n * 2);
console.log(result); // "Price: 100 and discount: 20"
// Capitalise first letter of each word
const title = "the quick brown fox".replace(/\b\w/g, c => c.toUpperCase());
console.log(title); // "The Quick Brown Fox"
// Censor profanity (replace with asterisks)
const censor = (text, words) => {
const re = new RegExp(`\\b(${words.join("|")})\\b`, "gi");
return text.replace(re, w => "*".repeat(w.length));
};
console.log(censor("What the heck is going on", ["heck"]));
// "What the **** is going on"
▶ Expected Output:
Price: 100 and discount: 20
The Quick Brown Fox
What the **** is going on
String.prototype.replaceAll(string|regexp, replacement) — Replace All
Like replace() with g, but can also accept a plain string as the first argument (not just regex). When used with a regex, the g flag is required.
// With plain string
console.log("a.b.c.d".replaceAll(".", "/")); // "a/b/c/d"
// With regex (g flag required)
console.log("hello world".replaceAll(/\w+/g, w => w.toUpperCase()));
// "HELLO WORLD"
String.prototype.split(regexp) — Split on a Pattern
Splits a string into an array, using the regex as the delimiter. More flexible than splitting on a fixed string.
// Split on any whitespace (including multiple spaces/tabs)
console.log(" hello world ".split(/\s+/).filter(Boolean));
// ["hello", "world"]
// Split on comma or semicolon or pipe
console.log("a,b;c|d".split(/[,;|]/));
// ["a", "b", "c", "d"]
// Split on digit runs
console.log("abc123def456ghi".split(/\d+/));
// ["abc", "def", "ghi"]
// Split but KEEP the delimiters (using a capturing group)
console.log("one, two; three".split(/(,|;)\s*/));
// ["one", ",", "two", ";", "three"]
▶ Expected Output:
["hello", "world"]
["a", "b", "c", "d"]
["abc", "def", "ghi"]
["one", ",", "two", ";", "three"]
Methods Summary Table
| Method | Returns | Use When |
|---|---|---|
re.test(str) |
Boolean | Quick check: does it match? |
re.exec(str) |
Array or null | Need match position + groups, one at a time |
str.match(re) |
Array or null | Get all matched strings (with g) |
str.matchAll(re) |
Iterator | All matches WITH group info (needs g) |
str.search(re) |
Number (-1 = no match) | Find position of first match |
str.replace(re, rep) |
String | Substitute match(es) with string/function |
str.replaceAll(re, rep) |
String | Replace all (regex needs g) |
str.split(re) |
Array | Split string on pattern delimiter |
11. Applied Exercises
Phase 2 — Applied Exercises
Exercise 1 — Pattern Detective 🔍
Objective: Read patterns and predict what they match before running them. Build “regex reading” skills.
Scenario: You are reviewing a codebase’s input validation regexes. For each, explain what it validates and give a passing and failing example.
Warm-up Micro-Demo:
const re = /^\d{3}-\d{2}-\d{4}$/;
// Read it: ^ start, 3 digits, literal -, 2 digits, literal -, 4 digits, $ end
// Matches: US Social Security Number format (123-45-6789)
console.log(re.test("123-45-6789")); // true
console.log(re.test("12-345-6789")); // false
Task A — Analyse These Patterns
// Pattern set to analyse:
const patterns = {
p1: /^[a-zA-Z][a-zA-Z0-9_]{2,19}$/,
p2: /^(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$]).{8,}$/,
p3: /^\+?(\d{1,3})?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/,
p4: /^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/,
p5: /^([01]?\d|2[0-3]):[0-5]\d$/,
};
// Test and explain each pattern:
// p1 — Username rules
console.log("=== p1: Username ===");
console.log(patterns.p1.test("alice99")); // true
console.log(patterns.p1.test("9alice")); // false — must start with letter
console.log(patterns.p1.test("al")); // false — too short (min 3 total)
console.log(patterns.p1.test("alice!")); // false — ! not allowed
// p2 — Strong password
console.log("\n=== p2: Password ===");
console.log(patterns.p2.test("Secure1!pass")); // true
console.log(patterns.p2.test("secure1!pass")); // false — no uppercase
console.log(patterns.p2.test("Securepass!")); // false — no digit
console.log(patterns.p2.test("Se1!")); // false — too short
// p3 — Flexible phone number
console.log("\n=== p3: Phone ===");
console.log(patterns.p3.test("(080) 123-4567")); // true
console.log(patterns.p3.test("080.123.4567")); // true
console.log(patterns.p3.test("+1 800 555 1234")); // true
console.log(patterns.p3.test("123")); // false
// p4 — CSS hex color
console.log("\n=== p4: Hex Color ===");
console.log(patterns.p4.test("#FF5733")); // true — 6-digit hex
console.log(patterns.p4.test("#F57")); // true — 3-digit shorthand
console.log(patterns.p4.test("FF5733")); // false — no #
console.log(patterns.p4.test("#GGGGGG")); // false — G not hex
// p5 — 24-hour time
console.log("\n=== p5: 24-hour Time ===");
console.log(patterns.p5.test("00:00")); // true — midnight
console.log(patterns.p5.test("23:59")); // true — last minute
console.log(patterns.p5.test("24:00")); // false — 24 invalid
console.log(patterns.p5.test("9:30")); // true — single digit hour ok
Self-check questions:
- In
p2, why do lookaheads(?=...)not conflict with each other even though they all start at the same position? - In
p4, why does|inside the group give two alternatives of different lengths? - In
p5, why does[01]?\dmatch09but2[0-3]is needed for hours 20–23?
Exercise 2 — Text Extractor 📤
Objective: Use match(), matchAll(), and exec() to extract structured data from unstructured text.
Scenario: You receive a raw log file. Extract all relevant data fields using regex.
Warm-up Micro-Demo:
const log = "[2024-01-15 09:23:11] ERROR: Connection timeout";
const ts = log.match(/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/);
console.log("Timestamp:", ts[0]); // "2024-01-15 09:23:11"
Task A — Log Parser
const logData = `
[2024-03-15 08:00:01] INFO: Server started on port 3000
[2024-03-15 08:01:22] INFO: User alice@example.com logged in from 192.168.1.10
[2024-03-15 08:05:44] WARN: High memory usage: 87%
[2024-03-15 08:10:03] ERROR: Database connection failed (attempt 1 of 3)
[2024-03-15 08:10:15] ERROR: Database connection failed (attempt 2 of 3)
[2024-03-15 08:12:00] INFO: User bob@test.org logged in from 10.0.0.5
[2024-03-15 08:15:33] ERROR: Disk space critical: 95% used
[2024-03-15 08:20:11] INFO: Cache cleared — 1250 items removed
`.trim();
// 1. Extract all timestamps
const timestamps = logData.match(/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/g);
console.log("Timestamps:", timestamps.length, "entries");
// 2. Count log levels
const levels = { INFO: 0, WARN: 0, ERROR: 0 };
for (const [, level] of logData.matchAll(/\] (INFO|WARN|ERROR):/g)) {
levels[level]++;
}
console.log("\nLog level counts:", levels);
// 3. Extract all email addresses
const emails = logData.match(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g);
console.log("\nEmails found:", emails);
// 4. Extract all IP addresses
const ips = logData.match(/\b(?:\d{1,3}\.){3}\d{1,3}\b/g);
console.log("IPs found:", ips);
// 5. Extract ERROR entries with full details using named groups
const errorRe = /\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] ERROR: (?<msg>.+)/g;
console.log("\nERROR entries:");
for (const match of logData.matchAll(errorRe)) {
console.log(` [${match.groups.ts}] ${match.groups.msg}`);
}
// 6. Extract percentage values
const percentages = logData.match(/\d+(?=%)/g);
console.log("\nPercentages mentioned:", percentages);
Expected Output:
Timestamps: 8 entries
Log level counts: { INFO: 4, WARN: 1, ERROR: 3 }
Emails found: ["alice@example.com", "bob@test.org"]
IPs found: ["192.168.1.10", "10.0.0.5"]
ERROR entries:
[2024-03-15 08:10:03] Database connection failed (attempt 1 of 3)
[2024-03-15 08:10:15] Database connection failed (attempt 2 of 3)
[2024-03-15 08:15:33] Disk space critical: 95% used
Percentages mentioned: ["87", "95"]
Self-check questions:
- Why is
matchAll()better thanexec()in a loop for extracting multiple groups? - Why does
/\d+(?=%)/guse a lookahead for%instead of/\d+%/g? - What would happen if you forgot the
gflag when usingmatchAll()?
Exercise 3 — Text Transformer ✏️
Objective: Use replace() with functions to transform text programmatically.
Scenario: You’re building a markdown-to-HTML converter for a blogging platform.
Warm-up Micro-Demo:
// Bold: **text** → <strong>text</strong>
const bold = "This is **important** text".replace(/\*\*(.+?)\*\*/g, "<strong>$1</strong>");
console.log(bold); // "This is <strong>important</strong> text"
Task A — Markdown Converter
function markdownToHtml(md) {
return md
// Headings: ### → h3, ## → h2, # → h1
.replace(/^### (.+)$/gm, "<h3>$1</h3>")
.replace(/^## (.+)$/gm, "<h2>$1</h2>")
.replace(/^# (.+)$/gm, "<h1>$1</h1>")
// Bold: **text** → <strong>text</strong>
.replace(/\*\*(.+?)\*\*/g, "<strong>$1</strong>")
// Italic: *text* → <em>text</em>
.replace(/\*(.+?)\*/g, "<em>$1</em>")
// Inline code: `code` → <code>code</code>
.replace(/`([^`]+)`/g, "<code>$1</code>")
// Links: [text](url) → <a href="url">text</a>
.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2">$1</a>')
// Unordered list items: - item → <li>item</li>
.replace(/^- (.+)$/gm, "<li>$1</li>")
// Auto-link URLs not already in anchor tags
.replace(/(?<!href=")https?:\/\/[\w.-]+(?:\/[\w./?=&%-]*)*/g, '<a href="$&">$&</a>')
// Blank lines → paragraph breaks
.replace(/\n{2,}/g, "\n\n<hr>\n\n");
}
const markdown = `
# My Blog Post
This is a **bold statement** and this is *italic*.
## Features
- Fast and reliable
- Supports \`inline code\`
- Links like [Google](https://google.com)
### Note
Visit https://example.com for more info.
`.trim();
console.log(markdownToHtml(markdown));
Expected Output (sample):
<h1>My Blog Post</h1>
...
<strong>bold statement</strong> ... <em>italic</em>
...
<h2>Features</h2>
<li>Fast and reliable</li>
<li>Supports <code>inline code</code></li>
<li>Links like <a href="https://google.com">Google</a></li>
...
<a href="https://example.com">https://example.com</a>
Self-check questions:
- Why must
###be replaced before##and##before#? - Why does the link pattern use
[^\]]+inside[...]instead of.+? - Why does the URL auto-linker use a negative lookbehind
(?<!href=")?
12. Mini Project — Form Validator & Text Processor
Phase 3 — Project Simulation
Real-world scenario: You’re building a registration form validator and data sanitiser for a multi-country web application. The system must:
- Validate fields in real time (email, phone, password, username, URL)
- Extract and structure data from a pasted contact list
- Sanitise and redact sensitive information from text
- Generate a validation report
🔵 Stage 1 — Validation Engine
Goal: Build a reusable validation system using regex patterns.
Simple stage preview:
const isEmail = str => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(str);
console.log(isEmail("a@b.com")); // true
console.log(isEmail("bad")); // false
Stage 1 Full Code:
"use strict";
// ─── Validation Patterns ───────────────────────────────────────
const PATTERNS = {
// Username: 3-20 chars, starts with letter, letters/digits/underscores
username: /^[a-zA-Z][a-zA-Z0-9_]{2,19}$/,
// Email: local@domain.tld
email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
// Strong password: 8+ chars, uppercase, lowercase, digit, special char
password: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+]).{8,}$/,
// Phone: flexible international — E.164 or local formats
phone: /^\+?(?:\d[\s.-]?){7,14}\d$/,
// URL: http/https with optional path
url: /^https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(?:\/[^\s]*)?$/,
// Date: YYYY-MM-DD with basic range checking
date: /^(?:19|20)\d{2}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$/,
// Hex color: #RGB or #RRGGBB
hexColor: /^#(?:[0-9a-fA-F]{3}){1,2}$/,
// Postal code: flexible — US (12345 / 12345-6789), UK (SW1A 1AA), generic
postalCode: /^(?:\d{5}(?:-\d{4})?|[A-Z]{1,2}\d[A-Z\d]? ?\d[A-Z]{2}|\d{4,6})$/i,
};
// ─── Validation Messages ───────────────────────────────────────
const MESSAGES = {
username: "3–20 chars, must start with a letter, letters/digits/underscore only",
email: "Must be a valid email (user@domain.tld)",
password: "8+ chars with uppercase, lowercase, digit, and special character",
phone: "Valid international or local phone number",
url: "Must be a valid http/https URL",
date: "Must be YYYY-MM-DD format (year 1900–2099)",
hexColor: "Must be a valid CSS hex colour (#RGB or #RRGGBB)",
postalCode:"Must be a valid postal/zip code",
};
// ─── Core Validator ────────────────────────────────────────────
function validate(fieldName, value) {
const pattern = PATTERNS[fieldName];
if (!pattern) return { valid: false, error: `Unknown field: ${fieldName}` };
const trimmed = value.trim();
const valid = pattern.test(trimmed);
return {
field: fieldName,
value: trimmed,
valid,
error: valid ? null : MESSAGES[fieldName],
pattern: pattern.toString(),
};
}
// ─── Validate a full form object ───────────────────────────────
function validateForm(formData) {
const results = {};
let allValid = true;
for (const [field, value] of Object.entries(formData)) {
const result = validate(field, String(value));
results[field] = result;
if (!result.valid) allValid = false;
}
return { results, allValid };
}
// ─── Stage 1 Demo ─────────────────────────────────────────────
console.log("=".repeat(60));
console.log(" STAGE 1 — FORM VALIDATION ENGINE");
console.log("=".repeat(60));
const formSubmission = {
username: "alice_2024",
email: "alice@example.com",
password: "Secure1!Pass",
phone: "+44 7911 123456",
url: "https://alice-blog.com/about",
date: "2024-13-01", // ← invalid month!
hexColor: "#FF5733",
postalCode: "SW1A 1AA",
};
const { results, allValid } = validateForm(formSubmission);
for (const [field, r] of Object.entries(results)) {
const icon = r.valid ? "✅" : "❌";
console.log(`\n ${icon} ${field.padEnd(12)}: ${r.value}`);
if (!r.valid) console.log(` Error: ${r.error}`);
}
console.log(`\n${"─".repeat(60)}`);
console.log(` Form status: ${allValid ? "✅ All fields valid" : "❌ Fix the errors above"}`);
▶ Expected Output:
============================================================
STAGE 1 — FORM VALIDATION ENGINE
============================================================
✅ username : alice_2024
✅ email : alice@example.com
✅ password : Secure1!Pass
✅ phone : +44 7911 123456
✅ url : https://alice-blog.com/about
❌ date : 2024-13-01
Error: Must be YYYY-MM-DD format (year 1900–2099)
✅ hexColor : #FF5733
✅ postalCode : SW1A 1AA
────────────────────────────────────────────────────────────
Form status: ❌ Fix the errors above
🟢 Stage 2 — Contact List Parser & Extractor
Goal: Parse a pasted block of messy contact text, extract structured data, and normalise it.
Simple stage preview:
const line = "John Doe | john@doe.com | +1-800-555-0199";
const [name, email, phone] = line.split(/\s*\|\s*/);
console.log({ name, email, phone });
Stage 2 Full Code:
function parseContactList(rawText) {
const contacts = [];
// Split into non-empty lines
const lines = rawText.split("\n").map(l => l.trim()).filter(Boolean);
// Patterns for extraction
const emailRe = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/;
const phoneRe = /\+?(?:\d[\s.()\-]?){7,14}\d/;
const urlRe = /https?:\/\/[^\s,]+/;
for (const line of lines) {
// Try different delimiters: |, comma, semicolon, tabs
const parts = line.split(/\s*[|,;]\s*|\t+/).map(p => p.trim());
const email = (line.match(emailRe) || [])[0] || null;
const phone = (line.match(phoneRe) || [])[0] || null;
const url = (line.match(urlRe) || [])[0] || null;
// Guess the name — first part that is not email/phone/URL
const name = parts.find(p =>
p.length > 1 &&
!emailRe.test(p) &&
!phoneRe.test(p) &&
!urlRe.test(p) &&
!/^\d/.test(p) // doesn't start with digit
) || "Unknown";
// Normalise phone — strip non-digits except leading +
const normPhone = phone
? (phone.startsWith("+") ? "+" : "") + phone.replace(/[^\d]/g, "")
: null;
contacts.push({ name, email, phone: normPhone, url });
}
return contacts;
}
const rawContacts = `
Alice Nkosi | alice@example.com | +234 80 1234 5678 | https://alice.dev
Bob Asante, bob.asante@company.org, 0501234567
carol@test.com; Carol Martini; +44-7700-900123
David Obi <david.obi@firm.io> 08023456789 https://davidobi.co
Fatima Rashid — fatima_r@uni.edu — +49 151 23456789
`.trim();
console.log("\n" + "=".repeat(60));
console.log(" STAGE 2 — CONTACT LIST PARSER");
console.log("=".repeat(60));
const contacts = parseContactList(rawContacts);
contacts.forEach((c, i) => {
console.log(`\n Contact ${i + 1}: ${c.name}`);
console.log(` Email : ${c.email || "—"}`);
console.log(` Phone : ${c.phone || "—"}`);
console.log(` URL : ${c.url || "—"}`);
});
console.log(`\nTotal contacts parsed: ${contacts.length}`);
// Stats
const withEmail = contacts.filter(c => c.email).length;
const withPhone = contacts.filter(c => c.phone).length;
console.log(`With email: ${withEmail}/${contacts.length}`);
console.log(`With phone: ${withPhone}/${contacts.length}`);
▶ Expected Output:
============================================================
STAGE 2 — CONTACT LIST PARSER
============================================================
Contact 1: Alice Nkosi
Email : alice@example.com
Phone : +2348012345678
URL : https://alice.dev
Contact 2: Bob Asante
Email : bob.asante@company.org
Phone : 0501234567
Contact 3: Carol Martini
Email : carol@test.com
Phone : +447700900123
Contact 4: David Obi
Email : david.obi@firm.io
Phone : 08023456789
URL : https://davidobi.co
Contact 5: Fatima Rashid
Email : fatima_r@uni.edu
Phone : +4915123456789
Total contacts parsed: 5
With email: 5/5
With phone: 5/5
🟠 Stage 3 — Text Sanitiser & Redaction Report
Goal: Redact sensitive data (emails, phones, credit cards, SSNs) from a text block. Generate a full report of what was found and replaced.
Stage 3 Full Code:
function sanitiseText(text) {
const report = { redacted: [], counts: {} };
const rules = [
{
name: "Credit Card",
pattern: /\b(?:\d{4}[\s-]?){3}\d{4}\b/g,
replacement: match => "[CC:" + "*".repeat(match.replace(/\D/g,"").length - 4) +
match.replace(/\D/g,"").slice(-4) + "]",
},
{
name: "SSN",
pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
replacement: () => "[SSN:***-**-****]",
},
{
name: "Email",
pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
replacement: match => {
const [local, domain] = match.split("@");
return local.slice(0, 2) + "***@" + domain;
},
},
{
name: "Phone",
pattern: /\b\+?(?:\d[\s.-]?){6,14}\d\b/g,
replacement: match => {
const digits = match.replace(/\D/g, "");
return "[PHONE:****" + digits.slice(-4) + "]";
},
},
{
name: "IP Address",
pattern: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
replacement: ip => ip.replace(/\.\d+$/, ".***"),
},
];
let sanitised = text;
for (const rule of rules) {
let count = 0;
sanitised = sanitised.replace(rule.pattern, match => {
count++;
report.redacted.push({ type: rule.name, original: match });
return rule.replacement(match);
});
if (count > 0) report.counts[rule.name] = count;
}
return { sanitised, report };
}
const sensitiveText = `
Customer Support Transcript — Case #4821
Agent Notes:
The customer (alice.johnson@example.com) called at 14:23 from +1-555-867-5309.
Account linked to: bob_smith@corp.org and carol@test.io
Credit card on file: 4532 1234 5678 9010 (Visa) and 5555-4444-3333-2222 (MC)
Social Security Number provided: 123-45-6789
Remote IP logged: 192.168.10.254 and 10.0.0.1
Secondary phone: 0800 123 4567
Resolution: Refund of $150 approved. Case closed.
`.trim();
console.log("\n" + "=".repeat(60));
console.log(" STAGE 3 — TEXT SANITISER & REDACTION");
console.log("=".repeat(60));
const { sanitised, report } = sanitiseText(sensitiveText);
console.log("\n📄 SANITISED OUTPUT:\n");
console.log(sanitised);
console.log("\n📊 REDACTION REPORT:");
console.log("─".repeat(60));
for (const [type, count] of Object.entries(report.counts)) {
console.log(` ${type.padEnd(15)}: ${count} instance(s) redacted`);
}
console.log(`\n Total redactions: ${report.redacted.length}`);
console.log("\n Redacted items:");
report.redacted.forEach((r, i) => {
console.log(` ${i + 1}. [${r.type}] "${r.original}"`);
});
console.log("=".repeat(60));
▶ Expected Output:
============================================================
STAGE 3 — TEXT SANITISER & REDACTION
============================================================
📄 SANITISED OUTPUT:
Customer Support Transcript — Case #4821
Agent Notes:
The customer (al***@example.com) called at 14:23 from [PHONE:****5309].
Account linked to: bo***@corp.org and ca***@test.io
Credit card on file: [CC:********9010] (Visa) and [CC:********2222] (MC)
Social Security Number provided: [SSN:***-**-****]
Remote IP logged: 192.168.10.*** and 10.0.0.***
Secondary phone: [PHONE:****4567]
Resolution: Refund of $150 approved. Case closed.
📊 REDACTION REPORT:
────────────────────────────────────────────────────────────
Credit Card : 2 instance(s) redacted
SSN : 1 instance(s) redacted
Email : 3 instance(s) redacted
Phone : 2 instance(s) redacted
IP Address : 2 instance(s) redacted
Total redactions: 10
Redacted items:
1. [Credit Card] "4532 1234 5678 9010"
2. [Credit Card] "5555-4444-3333-2222"
3. [SSN] "123-45-6789"
4. [Email] "alice.johnson@example.com"
...
============================================================
Reflection questions:
- Why is the Credit Card pattern applied before the Phone pattern in the rules array? What could go wrong if the order were reversed?
- Why does the email redaction function split on
@and keep the domain visible? What privacy principle does partial redaction serve? - Why does the
replace()method receive a function instead of a string for most rules? What would a plain string replacement lose? - How would you extend the sanitiser to handle IBAN numbers (
GB82 WEST 1234 5698 7654 32) or passport numbers? - How would you use the
dflag (indices) to highlight exactly where in the original text each sensitive item appeared, for an audit trail?
Optional advanced features:
- Add a rule for detecting credit card numbers using the Luhn algorithm inside the replacement function
- Build a regex tester function that takes a pattern string and test cases, and returns a pass/fail table
- Extend the contact parser (Stage 2) to handle multi-line entries where name and contact info span two lines
- Add a
whitelistDomainsoption to the email redactor that does NOT redact emails from approved company domains
13. Completion Checklist
- ✅ I know two ways to create a regex: literal
/pattern/flagsandnew RegExp("pattern", "flags")— and when to use each. - ✅ I know to double backslashes in
new RegExp()strings:"\\d"=/\d/. - ✅ I understand all 8 flags:
i(case-insensitive),g(global),m(multiline),s(dotAll),d(indices),u(unicode),v(unicode sets),y(sticky). - ✅ I know the
gflag causeslastIndexto advance — and that reusing a global regex without resetting causes bugs. - ✅ I can write character classes:
[abc](any of),[^abc](none of),[a-z](range). - ✅ I know
^inside[...]means negation, but outside means start-of-string. - ✅ I know the hyphen
-must be at the start or end of[...]to be treated as a literal character. - ✅ I understand all shorthand classes:
\d,\D,\w,\W,\s,\S. - ✅ I know
.matches any character except newline — and that literal dots must be escaped as\.. - ✅ I can list the 12 special characters that need escaping:
. * + ? ^ $ { } [ ] | ( ) \. - ✅ I understand alternation
|and know to use groups(...)to control its scope. - ✅ I know all four assertion types:
^/$(anchors),\b/\B(word boundary),(?=...)(lookahead),(?!...)(negative lookahead),(?<=...)(lookbehind),(?<!...)(negative lookbehind). - ✅ I understand that lookarounds are zero-width — they check position without consuming characters.
- ✅ I can use all quantifiers:
*,+,?,{n},{n,},{n,m}— and their lazy versions with?. - ✅ I understand greedy vs lazy: greedy takes as much as possible, lazy takes as little as possible.
- ✅ I can create capturing groups
(...), non-capturing groups(?:...), and named groups(?<name>...). - ✅ I can use backreferences
\1and\k<name>to match repeated patterns. - ✅ I know the RegExp object properties:
source,flags,global,ignoreCase,multiline,lastIndex. - ✅ I can use all six regex-related methods:
test,exec,match,matchAll,search,replace,split. - ✅ I know when to use each method:
testfor yes/no,exec/matchAllfor groups,matchfor all strings,replacefor substitution,splitfor splitting,searchfor finding position. - ✅ I can use
replace()with a function for dynamic replacements. - ✅ I can use named groups in
replace()with$<name>. - ✅ I completed all three exercises and the three-stage mini project.
- ✅ I can build practical validation patterns for email, password, phone, URL, date, and hex color.
- ✅ I know that
Math.random()is not regex, and that real security filtering (like SQL injection prevention) needs parameterised queries — regex alone is not a security solution.
📌 One-Sentence Summary of Each Topic
Creating Regex: Regular expressions are created with literal /pattern/flags syntax (preferred — compiled at load time) or new RegExp("pattern", "flags") (required for dynamic patterns — remember to double backslashes in the string).
Flags: The eight flags modify regex behaviour globally — i ignores case, g finds all matches, m makes anchors match each line, s makes . include newlines, d adds match indices, u enables Unicode, v enables Unicode set operations, and y enables sticky matching.
Character Classes: Square-bracket character classes [...] match any single character from a set or range, with ^ inside for negation, - for ranges, and shorthand classes \d, \w, \s (and their uppercase negations) covering the most common cases.
Meta Characters: Meta characters give regex its power — . matches anything but a newline, \b marks word boundaries, | creates alternatives, backslash \ escapes special characters, and shorthand classes \d\w\s match entire categories of characters.
Assertions: Assertions are zero-width checks — ^ and $ anchor to string/line boundaries, \b marks word edges, and lookaheads (?=...) / (?!...) and lookbehinds (?<=...) / (?<!...) impose conditions on surrounding text without consuming it.
Quantifiers: Quantifiers control repetition — * (zero or more), + (one or more), ? (optional), {n,m} (exact range) — and appending ? to any quantifier switches it from greedy (match as much as possible) to lazy (match as little as possible).
Groups & Patterns: Capturing groups (...) extract matched substrings (numbered by match[n] or named via (?<name>...)), non-capturing groups (?:...) group without overhead, backreferences \1 / \k<name> re-match previously captured text, and combining these with anchors and quantifiers builds complete validation patterns.
RegExp Object: The RegExp object exposes source, flags, and per-flag boolean properties, while lastIndex tracks the current search position for global/sticky regexes — resetting it prevents the alternating true/false bug when reusing global patterns.
Methods: Six methods bridge regex and strings — test for boolean checks, exec/matchAll for full match details with groups, match for all matched strings, search for the match position, replace/replaceAll for substitution (with function callbacks for dynamic replacements), and split for pattern-based string splitting.
📘 Built from W3Schools.com —
js_regexp·js_regexp_flags·js_regexp_characters·js_regexp_meta_characters·js_regexp_assertions·js_regexp_quantifiers·js_regexp_patterns·js_regexp_objects·js_regexp_methodsFramework: Understand → Practice → Create