HomeSearch

Python re.match Performance

Test the performance of the re.match method against a custom def. Perform a benchmark.

Re, performance. With re.match we compare strings with a pattern. Usually regular expressions cause a performance loss. But is this important? I tested a custom method, written with for and if, against re.match.re.match, search

Example. We introduce two methods: stringmatch and stringmatch_re. Both methods test for the string "cat." The first and last letters must be present, and there may be one or more letter "a" in the middle.

Stringmatch: This method uses the if-statement to check the length and individual characters. It uses a for-loop to check the middle.

For

Stringmatch_re: This method is shorter. It uses the pattern "ca+t" to check for valid strings.

Python program that tests strings, uses re import re def stringmatch(s): # Check for "ca+t" with if-statements and loop. if len(s) >= 3 and s[0] == 'c' and s[len(s) - 1] == 't': for v in range(1, len(s) - 2): if s[v] != 'a': return False return True return False def stringmatch_re(s): # Check for "ca+t" with re. m = re.match(r"ca+t", s) if m: return True return False # Test these strings. tests = ["ct", "cat", "caaat", "dog", "car"] for t in tests: print(stringmatch(t), stringmatch_re(t), t) Output False False ct True True cat True True caaat False False dog False False car

Method notes. The methods both return the same values on the same strings. In many programs, stringmatch_re is a better choice because it is shorter and easier to understand. But it causes a performance loss.

Performance. I compared the two methods on some test strings. In all Python implementations, I found stringmatch, with no regular expressions, is faster. In Python 3.3, it takes less than half the time.

Note: In this experiment, stringmatch returns after finding an invalid length or an invalid start character.

Note 2: An optimized version of stringmatch_re, where the length and first character is checked outside re.match, might be possible.

Python program that times methods import re import time def stringmatch(s): # Check for "ca+t" with if-statements and loop. if len(s) >= 3 and s[0] == 'c' and s[len(s) - 1] == 't': for v in range(1, len(s) - 2): if s[v] != 'a': return False return True return False def stringmatch_re(s): # Check for "ca+t" with re. m = re.match(r"ca+t", s) if m: return True return False print(time.time()) # Version 1: string if, for for i in range(0, 10000000): result = stringmatch("ct") result = stringmatch("caat") result = stringmatch("dooog") print(time.time()) # Version 2: re.match for i in range(0, 10000000): result = stringmatch_re("ct") result = stringmatch_re("caat") result = stringmatch_re("dooog") print(time.time()) Output 1411309406.96144 1411309430.354504 stringmatch = 23.39 s 1411309480.849815 stringmatch_re = 50.50 s

Summary. In nearly all of my experiments, replacing a regular expression with an if-statement and loop is an optimization. Rarely, in an extremely slow language, the imperative approach may be slower. In Python this is not true.
Home
Dot Net Perls
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.