Regular Expressions

Beamtalk provides regex support through both String methods and the Regex class. Most operations accept either a pattern string or a compiled Regex object.

Matching

matchesRegex: tests whether a string matches a pattern:

"hello world" matchesRegex: "hello"      // => true
"hello world" matchesRegex: "^world"     // => false
"hello world" matchesRegex: "world$"     // => true
"abc123" matchesRegex: "^[a-z]+[0-9]+$"  // => true

Finding matches

firstMatch: returns the first match, or nil if none:

"hello world" firstMatch: "[aeiou]"       // => e
"hello world" firstMatch: "[0-9]+"        // => nil
"abc 123 def 456" firstMatch: "[0-9]+"    // => 123

allMatches: returns all matches as a list:

matches := "cat and cat" allMatches: "cat"  // => _
matches size                                // => 2
vowels := "hello" allMatches: "[aeiou]"  // => _
vowels size                              // => 2
noMatch := "xyz" allMatches: "[0-9]"  // => _
noMatch size                          // => 0

Replacement

replaceRegex:with: replaces the first match:

"hello world" replaceRegex: "[aeiou]" with: "*"  // => h*llo world

replaceAllRegex:with: replaces all matches:

"hello world" replaceAllRegex: "[aeiou]" with: "*"  // => h*ll* w*rld

Splitting

splitRegex: splits a string on a pattern:

parts := "one,two,,three" splitRegex: ","  // => _
parts size                                 // => 4
words := "hello   world" splitRegex: " +"  // => _
words size                                 // => 2

Compiled regex objects

For repeated use, compile a pattern once with Regex from::

r := (Regex from: "[0-9]+") unwrap  // => _
"abc 123 def" firstMatch: r         // => 123
"no digits" matchesRegex: r         // => false
"99 bottles" matchesRegex: r        // => true

Access the original pattern:

r := (Regex from: "[a-z]+") unwrap  // => _
r source                            // => [a-z]+

Case-insensitive matching

Pass #(#caseless) as options:

"Hello" matchesRegex: "hello"                     // => false
"Hello" matchesRegex: "hello" options: #(#caseless)  // => true

Compiled with options:

r := (Regex from: "hello" options: #(#caseless)) unwrap  // => _
"HELLO WORLD" matchesRegex: r                            // => true
"HELLO WORLD" firstMatch: r                              // => HELLO

Error handling

Invalid patterns return an error Result:

bad := Regex from: "[invalid"  // => _
bad isError                    // => true

Summary

Matching:

string matchesRegex: pattern                    → Boolean
string matchesRegex: pattern options: #(#caseless)  → Boolean

Finding:

string firstMatch: pattern    → String or nil
string allMatches: pattern    → List of Strings

Replacing:

string replaceRegex: pattern with: replacement      → String (first)
string replaceAllRegex: pattern with: replacement   → String (all)

Splitting:

string splitRegex: pattern    → List of Strings

Compiled regex:

Regex from: pattern                       → Result<Regex>
Regex from: pattern options: #(#caseless) → Result<Regex>
regex source                              → String

Exercises

1. Digit censoring. Use replaceAllRegex:with: to replace all digits in "My phone is 555-1234" with "*".

Hint
"My phone is 555-1234" replaceAllRegex: "[0-9]" with: "*"
// => "My phone is ***-****"

2. Extract numbers. Use allMatches: to find all numbers in the string "Order 42 has 3 items at $15 each". How many matches are there?

Hint
matches := "Order 42 has 3 items at $15 each" allMatches: "[0-9]+"
matches size    // => 3  (42, 3, 15)

3. Split on whitespace. Use splitRegex: to split "hello world foo" on one or more spaces. Compare with the words message — do they produce the same result?

Hint
"hello   world   foo" splitRegex: " +"    // => ["hello", "world", "foo"]
"hello   world   foo" words               // => ["hello", "world", "foo"]

Both produce the same result. words is a convenience method that splits on any whitespace.

Next: Chapter 20 — JSON