Split
Often strings are read in from lines of a file. And these lines have many parts, separated by delimiters. With use split()
to break
them apart.
Regex
Split
in Java uses a Regex
. A single character (like a comma) can be split upon. Or a more complex pattern (with character codes) can be used.
Let's begin with this example. We introduce a string
that has 2 commas in it, separating 3 strings (cat, dog, bird). We split on a comma.
Split
returns a String
array. We then loop over that array's elements with a for
-each loop. We display them.public class Program { public static void main(String[] args) { // This string has 3 words separated by commas. String value = "cat,dog,bird"; // Split on a comma. String parts[] = value.split(","); // Display result parts. for (String part : parts) { System.out.println(part); } } }cat dog bird
Split
lines in fileHere we use BufferedReader
and FileReader
to read in a text file. Then, while looping over it, we split each line. In this way we parse a CSV file with split.
System.out.println
method to display each part from each line to the screen.carrot,squash,turnip potato,spinach,kaleimport java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class Program { public static void main(String[] args) throws IOException { // Open this file. BufferedReader reader = new BufferedReader(new FileReader( "C:\\programs\\file.txt")); // Read lines from file. while (true) { String line = reader.readLine(); if (line == null) { break; } // Split line on comma. String[] parts = line.split(","); for (String part : parts) { System.out.println(part); } System.out.println(); } reader.close(); } }carrot squash turnip potato spinach kale
Often data is inconsistent. Sometimes we need to split on a range or set of characters. With split, this is possible. Here we split on a comma and a colon.
public class Program { public static void main(String[] args) { String line = "carrot:orange,apple:red"; // Split on comma or colon. String[] parts = line.split("[,:]"); for (String part : parts) { System.out.println(part); } } }carrot orange apple red
Count
, separate wordsWe can use more advanced character patterns in split. Here we separate a String
based on non-word characters. We use "\W+" to mean this.
public class Program { public static void main(String[] args) { String line = "hello, how are you?"; // Split on 1+ non-word characters. String[] words = line.split("\\W+"); // Count words. System.out.println(words.length); // Display words. for (String word : words) { System.out.println(word); } } }4 hello how are you
This example splits a string
apart and then uses parseInt
to convert those parts into ints. It splits on a two-char
sequence. Then in a loop, it calls parseInt
on each String
.
public class Program { public static void main(String[] args) { String line = "1, 2, 3"; // Split on two-char sequence. String[] numbers = line.split(", "); // Display numbers. for (String number : numbers) { int value = Integer.parseInt(number); System.out.println(value + " * 20 = " + value * 20); } } }1 * 20 = 20 2 * 20 = 40 3 * 20 = 60
Split
accepts an optional second parameter, a limit Integer. If we provide this, the result array has (at most) that many elements. Any extra parts remain part of the last element.
Regex
. Here we escape the vertical bar so it is treated like a normal char
.public class Program { public static void main(String[] args) { String value = "a|b|c|d|e"; // Use limit of just 3 parts. // ... Escape the bar for a Regex. String parts[] = value.split("\\|", 3); // Only 3 elements are in the result array. for (String part : parts) { System.out.println(part); } } }a b c|d|e
Pattern.compile
, splitA split method is available on the Pattern class
, found in java.util.regex
. We can compile a Pattern and reuse it many times. This can enhance performance.
Pattern.compile
optimizes all split()
calls afterwards. But this only helps if many splits are done.import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // Separate based on number delimiters. Pattern p = Pattern.compile("\\d+"); String value = "abc100defgh9ij"; String[] elements = p.split(value); // Display our results. for (String element : elements) { System.out.println(element); } } }abc defgh ij
We can improve the speed of splitting strings based on regular expressions by using Pattern.compile
. We create a delimiter pattern. Then we call split()
with it.
split()
: it reuses the same Pattern instance many times.split()
with a Regex
argument, so it does not reuse the same Regex
.Pattern.compile
before using its Split
method optimizes performance.import java.util.regex.Pattern; public class Program { public static void main(String[] args) { // ... Create a delimiter pattern. Pattern pattern = Pattern.compile("\\W+"); String line = "cat; dog--ABC"; long t1 = System.currentTimeMillis(); // Version 1: use split method on Pattern. for (int i = 0; i < 1000000; i++) { String[] values = pattern.split(line); if (values.length != 3) { System.out.println(false); } } long t2 = System.currentTimeMillis(); // Version 2: use String split method. for (int i = 0; i < 1000000; i++) { String[] values = line.split("\\W+"); if (values.length != 3) { System.out.println(false); } } long t3 = System.currentTimeMillis(); // ... Benchmark results. System.out.println(t2 - t1); System.out.println(t3 - t2); } }471 ms, Pattern split 549 ms, String split
Join
This method combines Strings together—we specify our desired delimiter String
. Join
is sophisticated. It can handle a String
array or individual Strings.
We can count the words in a string
by splitting the string
on non-word (or space) characters. This is not the fastest method, but it tends to be a fairly accurate one.
With split, we use a regular expression-based pattern. But for simple cases, we provide the delimiter itself as the pattern. This too works. Split
is elegant and powerful.