Split. Often strings are read in from lines of a file. And these lines have many parts, separated by delimiters. With use split() to break them apart.
Regex. Split in Java uses a Regex. A single character (like a comma) can be split upon. Or a more complex pattern (with character codes) can be used.
A simple example. Let's begin with this example. We introduce a string that has 2 commas in it, separating 3 strings (cat, dog, bird). We split on a comma.
Return Split returns a String array. We then loop over that array's elements with a for-each loop. We display them.
public class Program {
public static void main(String[] args) {
// This string has 3 words separated by commas.
String value = "cat,dog,bird";
// Split on a comma.
String parts[] = value.split(",");
// Display result parts.
for (String part : parts) {
System.out.println(part);
}
}
}cat
dog
bird
Split lines in file. Here we use BufferedReader and FileReader to read in a text file. Then, while looping over it, we split each line. In this way we parse a CSV file with split.
carrot,squash,turnip
potato,spinach,kaleimport java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Program {
public static void main(String[] args) throws IOException {
// Open this file.
BufferedReader reader = new BufferedReader(new FileReader(
"C:\\programs\\file.txt"));
// Read lines from file.
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
// Split line on comma.
String[] parts = line.split(",");
for (String part : parts) {
System.out.println(part);
}
System.out.println();
}
reader.close();
}
}carrot
squash
turnip
potato
spinach
kale
Either character. Often data is inconsistent. Sometimes we need to split on a range or set of characters. With split, this is possible. Here we split on a comma and a colon.
Tip With square brackets, we specify the possible characters to split upon. So we split on all colons and commas, with one call.
public class Program {
public static void main(String[] args) {
String line = "carrot:orange,apple:red";
// Split on comma or colon.
String[] parts = line.split("[,:]");
for (String part : parts) {
System.out.println(part);
}
}
}carrot
orange
apple
red
Count, separate words. We can use more advanced character patterns in split. Here we separate a String based on non-word characters. We use "\W+" to mean this.
Detail The pattern means "one or more non-word characters." A plus means "one or more" and a W means non-word.
Note The comma and its following space are treated as a single delimiter. So two characters are matched as one delimiter.
public class Program {
public static void main(String[] args) {
String line = "hello, how are you?";
// Split on 1+ non-word characters.
String[] words = line.split("\\W+");
// Count words.
System.out.println(words.length);
// Display words.
for (String word : words) {
System.out.println(word);
}
}
}4
hello
how
are
you
Numbers. This example splits a string apart and then uses parseInt to convert those parts into ints. It splits on a two-char sequence. Then in a loop, it calls parseInt on each String.
public class Program {
public static void main(String[] args) {
String line = "1, 2, 3";
// Split on two-char sequence.
String[] numbers = line.split(", ");
// Display numbers.
for (String number : numbers) {
int value = Integer.parseInt(number);
System.out.println(value + " * 20 = " + value * 20);
}
}
}1 * 20 = 20
2 * 20 = 40
3 * 20 = 60
Limit. Split accepts an optional second parameter, a limit Integer. If we provide this, the result array has (at most) that many elements. Any extra parts remain part of the last element.
Info To have a limit argument, we must use a Regex. Here we escape the vertical bar so it is treated like a normal char.
Here We get the first 2 parts split apart correctly, and the third part has all the remaining (unsplit) parts.
public class Program {
public static void main(String[] args) {
String value = "a|b|c|d|e";
// Use limit of just 3 parts.// ... Escape the bar for a Regex.
String parts[] = value.split("\\|", 3);
// Only 3 elements are in the result array.
for (String part : parts) {
System.out.println(part);
}
}
}a
b
c|d|e
Pattern.compile, split. A split method is available on the Pattern class, found in java.util.regex. We can compile a Pattern and reuse it many times. This can enhance performance.
Note A call to Pattern.compile optimizes all split() calls afterwards. But this only helps if many splits are done.
import java.util.regex.Pattern;
public class Program {
public static void main(String[] args) {
// Separate based on number delimiters.
Pattern p = Pattern.compile("\\d+");
String value = "abc100defgh9ij";
String[] elements = p.split(value);
// Display our results.
for (String element : elements) {
System.out.println(element);
}
}
}abc
defgh
ij
Benchmark, pattern split. We can improve the speed of splitting strings based on regular expressions by using Pattern.compile. We create a delimiter pattern. Then we call split() with it.
Version 1 This version of the code uses Pattern split(): it reuses the same Pattern instance many times.
Version 2 This code uses split() with a Regex argument, so it does not reuse the same Regex.
Result When many Strings are split, a call Pattern.compile before using its Split method optimizes performance.
import java.util.regex.Pattern;
public class Program {
public static void main(String[] args) {
// ... Create a delimiter pattern.
Pattern pattern = Pattern.compile("\\W+");
String line = "cat; dog--ABC";
long t1 = System.currentTimeMillis();
// Version 1: use split method on Pattern.
for (int i = 0; i < 1000000; i++) {
String[] values = pattern.split(line);
if (values.length != 3) {
System.out.println(false);
}
}
long t2 = System.currentTimeMillis();
// Version 2: use String split method.
for (int i = 0; i < 1000000; i++) {
String[] values = line.split("\\W+");
if (values.length != 3) {
System.out.println(false);
}
}
long t3 = System.currentTimeMillis();
// ... Benchmark results.
System.out.println(t2 - t1);
System.out.println(t3 - t2);
}
}471 ms, Pattern split
549 ms, String split
Join. This method combines Strings together—we specify our desired delimiter String. Join is sophisticated. It can handle a String array or individual Strings.
Word count. We can count the words in a string by splitting the string on non-word (or space) characters. This is not the fastest method, but it tends to be a fairly accurate one.
With split, we use a regular expression-based pattern. But for simple cases, we provide the delimiter itself as the pattern. This too works. Split is elegant and powerful.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Feb 23, 2023 (edit).