re.sub, subn Methods
This page was last reviewed on Feb 16, 2024.
Re.sub. In regular expressions, sub stands for substitution. The re.sub method applies a method to all matches. It evaluates a pattern, and for each match calls a method (or lambda).
This method can modify strings in complex ways. We can apply transformations, like change numbers within a string. The syntax can be hard to follow.
This example introduces a method "multiply" that receives a match. It accesses group(0) and converts it into an integer. It multiplies that number by two, and converts it to a string.
Argument 1 The first argument to re.sub() is "\d+" which means one or more digit chars.
Argument 2 The second argument is the multiply method name—this is the method that re.sub calls on each substitution.
Argument 3 We pass a string for processing. In this example, we use a sample string with several 2-digit numbers.
Result The re.sub method matched each group of digits (each number) and the multiply method doubled it.
import re def multiply(m): # Convert group 0 to an integer. v = int(m.group(0)) # Multiply integer by 2. # ... Convert back into string and return it. return str(v * 2) # Use pattern of 1 or more digits. # ... Use multiply method as second argument. result = re.sub("\d+", multiply, "10 20 30 40 50") print(result)
20 40 60 80 100
String. Re.sub can replace a pattern match with a simple string. No method call or lambda is required. Here we replace a pattern with the string "ring."
import re # An example string. v = "running eating reading" # Replace words starting with "r" and ending in "ing" # ... with a new string. v = re.sub(r"r.*?ing", "ring", v) print(v)
ring eating ring
Subn. Usually re.sub() is sufficient. But another option exists. The re.subn method has an extra feature. It returns a tuple with a count of substitutions in the second element.
Tip If you must know the number of substitutions made by re.sub, using re.subn is an ideal choice.
However If your program has no use of this information, using re.sub is probably best. It is simpler and more commonly used.
import re def add(m): # Convert. v = int(m.group(0)) # Add 2. return str(v + 1) # Call re.subn. result = re.subn("\d+", add, "1 2 3 4 5") print("Result string:", result[0]) print("Number of substitutions:", result[1])
Result string: 11 21 31 41 51 Number of substitutions: 5
Lambda. A def method name can be used in re.sub. But a lambda offers a more terse alternative. Here we specify a lambda expression directly within the re.sub argument list.
Here We add the string "ing" to the end of all words within the input string. Additional logic could be used to make the results better.
Tip A gerund form of a verb cannot be made this way all the time. Sometimes other spelling changes are needed.
import re # The input string. input = "laugh eat sleep think" # Use lambda to add "ing" to all words. result = re.sub("\w+", lambda m: m.group(0) + "ing", input) # Display result. print(result)
laughing eating sleeping thinking
Dictionary example. The re.sub method can be used with a dictionary. In the method provided to re.sub, we access a dictionary to influence our action.
Here We replace all known "plant" strings with the string PLANT. On other words, modify() takes no action.
import re plants = {"flower": 1, "tree": 1, "grass": 1} def modify(m): v = m.group(0) # If string is in dictionary, return different string. if v in plants: return "PLANT" # Do not change anything. return v # Modify to remove all strings within the dictionary. result = re.sub("\w+", modify, "bird flower dog fish tree") print(result)
bird PLANT dog fish PLANT
Summary. Re.sub, and its friend re.subn, can replace substrings in arbitrary ways. A method can test the contents of a match and change it using any algorithm.
And with a pattern, we can specify nearly any textual sequence to match. We can change a string to any other string (with a sufficient algorithm).
