HomeSearch

Java Regex Examples (Pattern.matches)

Use Regex: create Pattern and Matcher. Text is tested with regular expressions.

Regex.

An animal leaves footprints in the dirt. We can learn much from these tracks—from the pattern they leave behind. We can match animals to their footprints.

In Java regexes,

we match strings (not footprints) to patterns. We can match the string "c.t" with "cat." We use things like Pattern.compile and Matcher.

Pattern.matches example.

We call Pattern.matches in a loop. Its first argument is the regular expression's pattern. It also accepts the string we want to test for matches.

And: It returns a boolean. If a match was found, this value equals true. For groups, we need to instead use a Matcher.

Java program that uses Pattern.matches import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Some strings to test. String[] inputs = { "dog", "dance", "cat", "dirt" }; // Loop over strings and test them. for (String input : inputs) { boolean b = Pattern.matches("d.+", input); System.out.println(b); } } } Output true true false true Pattern d A digit character. .+ One or more characters of any type.

Pattern.compile and Matcher.

Next we learn a faster way to match regular expressions. We use Pattern.compile to create a compiled pattern object.

Then: We call the matcher() method on the pattern instance. This returns a Matcher class instance.

Matches: Finally the matches method is used. This returns true if the matcher has a match of the compiled pattern.

Java program that uses Pattern.compile, Matcher import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Compile this pattern. Pattern pattern = Pattern.compile("num\\d\\d\\d"); // See if this String matches. Matcher m = pattern.matcher("num123"); if (m.matches()) { System.out.println(true); } // Check this String. m = pattern.matcher("num456"); if (m.matches()) { System.out.println(true); } } } Output true true Pattern num The letters "num" must be present. \d\d\d Three digits characters.

Capturing groups.

Often regular expression patterns use groups to capture parts of strings. Here we use positional groups. We access them by their position (1, 2 or more).

Tip: We create the compiled Pattern and initialize the Matcher like usual. After calling matches() we access groups.

Java program that uses group method import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { Pattern pattern = Pattern.compile("(\\d+)\\-(\\d+)"); // Get matcher on this String. Matcher m = pattern.matcher("1234-5678"); // If it matches, get and display group values. if (m.matches()) { String part1 = m.group(1); String part2 = m.group(2); System.out.println(part1); System.out.println(part2); } } } Output 1234 5678 Pattern (\d+) One or more digit characters, in a group. \- A hyphen.

Named groups.

With names, we easily access specific groups from a matched pattern. We use angle brackets to name groups in the pattern. Then we call group() with a String name argument.
Java program that uses named groups import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Specify a pattern with named groups. Pattern pattern = Pattern.compile("(?<first>..)x(?<second>..)"); Matcher m = pattern.matcher("c3xp0"); // Check for matches. // ... Then access named groups by their names. if (m.matches()) { String part1 = m.group("first"); String part2 = m.group("second"); System.out.println(part1); System.out.println(part2); } } } Output c3 p0 Pattern (?<first>..) Group named "first" with two characters. x Letter x. (?<second>..) Group named "second" with two characters.

Pattern.quote.

Characters must be escaped ("quoted") to avoid being seen as metacharacters. For example a star must be escaped to mean an asterisk, not a Kleene closure of "zero or more."

Pattern.quote: This method surrounds a String with a Q and an E. Between these characters, everything is escaped.

So: We match the star as a star. Without Pattern.quote, we receive a "dangling metacharacter" exception.

Java program that uses Pattern.quote import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Quote this value. String value = "*star"; String quote = Pattern.quote(value); System.out.println(value); System.out.println(quote); // Try matching with quoted value. boolean result1 = Pattern.matches(quote, "*star"); System.out.println(result1); // This fails because it was not quoted. boolean result2 = Pattern.matches(value, "*star"); System.out.println(result2); } } Output *star \Q*star\E true Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 *star ^

Start, end in pattern.

Often in regular expressions we want to match the start or end of strings. Two metacharacters are useful here: the "^" and the "$." These match the start, the end.

Here: A method called startsWithAEndsWithZ tests a String. It returns true if the first char is "a" and the last is "z."

Caution: Testing chars (with startsWith, endsWith, charAt) is more efficient. But it becomes harder to code when requirements change.

Java program that tests start, end in pattern import java.util.regex.Pattern; public class Program { public static boolean startsWithAEndsWithZ(String value) { // Test start and end characters. return Pattern.matches("^a.*z$", value); } public static void main(String[] args) { String[] values = { "a123z", "b123z", "az", "aq", "aza" }; // Loop over and test these Strings. for (String value : values) { System.out.print(value); System.out.print(' '); System.out.println(startsWithAEndsWithZ(value)); } } } Output a123z true b123z false az true aq false aza false Pattern ^ Matches start of string. a Lowercase a. .* Zero or more characters. z Lowercase z. $ Matches end of string.

A benchmark.

This benchmark compares the performance of using Pattern.compile (and the matches method) with Pattern.matches. With compile() we reuse the same pattern many times.

Result: Using compile() and a Matcher is a clear performance boost. This approach is 200% faster than Pattern.matches.

Java program that benchmarks Matcher, Pattern.matches import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) throws Exception { // ... Compile. Pattern pattern = Pattern.compile("num\\d\\d\\d"); long t1 = System.currentTimeMillis(); // ... Use Matcher with compiled pattern. for (int i = 0; i < 100000; i++) { Matcher m = pattern.matcher("num123"); if (!m.matches()) { throw new Exception(); } } long t2 = System.currentTimeMillis(); // ... Use Pattern.matches method. for (int i = 0; i < 100000; i++) { if (!Pattern.matches("num\\d\\d\\d", "num123")) { throw new Exception(); } } long t3 = System.currentTimeMillis(); // ... Times. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Output 31 ms, Pattern.compile, Matcher 90 ms, Pattern.matches

Performance, named groups.

We reference groups with names or indexes using the group method on Matcher. In this test, named accesses are slower. Using indexes, like 1 or 2, is faster.

So: Unless named groups in a Regex make the program much clearer, it is a better choice to use indexes to access groups.

Java program that times named groups, matcher import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // ... Compile. Pattern pattern1 = Pattern .compile("(?<digitpart>\\d\\d),(?<letterpart>\\s+)"); Pattern pattern2 = Pattern.compile("(\\d\\d),(\\s+)"); long t1 = System.currentTimeMillis(); // ... Use pattern with named groups. for (int i = 0; i < 200000; i++) { Matcher m = pattern1.matcher("34,cat"); if (m.matches()) { String part1 = m.group("digitpart"); String part2 = m.group("letterpart"); if (part1 != "34" || part2 != "cat") { System.out.println(false); break; } } } long t2 = System.currentTimeMillis(); // ... Use pattern with indexed (ordinal) groups. for (int i = 0; i < 200000; i++) { Matcher m = pattern2.matcher("34,cat"); if (m.matches()) { String part1 = m.group(1); String part2 = m.group(2); if (part1 != "34" || part2 != "cat") { System.out.println(false); break; } } } long t3 = System.currentTimeMillis(); // ... Times. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Output 44 ms, group(name) 21 ms, group(index)

Split, Pattern.

A split method is available on Pattern instances. This lets us split based on a Regex delimiter. The Pattern can be compiled once and reused many times.Split
Java program that uses split, Pattern import java.util.regex.Pattern; public class Program { public static void main(String[] args) { String line = "cat, dog, rabbit--100"; // Compile a Pattern that indicates a delimiter. Pattern p = Pattern.compile("\\W+"); // Split a String based on the delimiter pattern. String[] elements = p.split(line); for (String element : elements) { System.out.println(element); } } } Output cat dog rabbit 100 Pattern \W+ One or more non-word characters.

Pattern.COMMENTS.

With this flag we can use comments in a regular expression pattern. This can help make larger regular expressions easier to read and maintain.

Tip: We must have comments that start with a pound sign (hash) and end in a newline.

Tip 2: With comments mode whitespace is ignored. So we can specify spaces with "\W" to indicate non-word characters.

Java program that uses Pattern.COMMENTS flag import java.util.regex.Matcher; import java.util.regex.Pattern; public class Program { final static String example = "#Match line string\n" + "line\\W" + "#Match one or more digits and a separator\n" + "\\d+\\W+" + "#Match one or more word chars\n" + "\\w+"; public static void main(String[] args) { // Compile this pattern with COMMENTS. // ... Whitespace is ignored and comments are allowed. Pattern pattern = Pattern.compile(example, Pattern.COMMENTS); // This line with succeed. Matcher m = pattern.matcher("line 123: BIRD"); if (m.matches()) { System.out.println(m.toString()); } // This will not succeed. m = pattern.matcher("test failure"); if (m.matches()) { System.out.println(false); // Not reached. } } } Output java.util.regex.Matcher[pattern=#Match line string line\W#Match one or more digits and a separator \d+\W+#Match one or more word chars \w+ region=0,14 lastmatch=line 123: BIRD]

Matches.

This method receives a Regex string. If the pattern we supply matches the string we call matches() on, we get a true result. Otherwise it returns false.

Note: Matches() is the same as calling Pattern.matches directly. But this syntax may be easier to use in programs.

Java program that uses matches public class Program { public static void main(String[] args) { String value = "carrots"; // This regular expression matches. boolean result1 = value.matches("c.*s"); System.out.println(result1); // This regular expression does not match. boolean result2 = value.matches("c.*x"); System.out.println(result2); } } Output true false

Word count.

A regular expression can be used to count words. The split() method is helpful here. But a faster option is to use a for-loop.Word Count

HTML.

This is perhaps the most used document format in the world. With a Regex we can manipulate simple HTML tags. But this is not a general-purpose solution.Remove HTML Tags

Often,

regular expressions reduce performance. The special text language used has some costs. With String methods and for-loops, we can directly manipulate and test Strings.

Complex tasks.

When complexity builds, writing custom loops becomes a challenge. With Regex we simplify programs. We make them easier to write, to understand.
Home
Dot Net Perls
© 2007-2019 Sam Allen. All rights reserved. Written by Sam Allen, info@dotnetperls.com.