Regexp
String
processing is difficult. We must account for sequences and ranges of characters. Data is often imperfect—it has inconsistencies.
With Regexp
, regular expressions, we use a text language to better handle this data. Ruby provides an operator, "~=" to make regular expressions easier to use.
The match method applies a regular expression to a string
parameter. If the regular expression does not fit, match returns nil
.
string
array. We see whether the string
matches the pattern.nil
.values = ["123", "abc", "456"] # Iterate over each string. values.each do |v| # Try to match this pattern. m = /\d\d\d/.match(v) # If match is not nil, display it. if m puts m end end123 456\d A digit character 0-9. \d\d\d Three digit characters.
This performs a task similar to the match method. We place the regular expression on the left side, then use the matching operator. A string
goes on the right side.
nil
, the match failed. But if the matching was successful, an integer is returned.if
-statement to indicate success.# The string input. input = "plutarch" # If string matches this pattern, display something. if /p.*/ =~ input puts "lives" endlivesp Matches lowercase letter p. .* Matches zero or more characters of any type.
A Regexp
is by default case-sensitive. We can modify this by specifying a special flag "i," which stands for "ignore case" or case-insensitive.
# A string array. names = ["Marcus Aurelius", "sam allen"] # Test each name. names.each do |name| # Use case-insensitive regular expression. if /\ a/i =~ name puts name end endMarcus Aurelius sam allen"\ " Matches a space. "a" Matches the letter "a". i Specifies the expression is case-insensitive.
Replace
With gsub we replace characters in a string
based on a pattern. This is a replace()
method that uses regular expressions.
string
methods. With gsub we use a string
method with a regular expression.value = "caaat" # Replace multiple "a" letters with one. result = value.gsub(/a+/, "a") puts value puts resultcaaat cat
Split
The split method also accepts regular expressions. We use a Regexp
to specify the delimiter pattern. So we then extract the parts that are separated by the matching patterns.
With split, we can count words in a string
. We simply split on non-word characters and return the length of the array.
Often we require no regular expressions. We just use string
methods. But in many cases, complexities and inconsistencies surface. Regexp
then becomes a better approach.