Split. Often strings are read in from lines of a file. And these lines have many parts, separated by delimiters. With use split() to break them apart.
Regex. Split in Java uses a Regex. A single character (like a comma) can be split upon. Or a more complex pattern (with character codes) can be used.
A simple example. Let's begin with this example. We introduce a string that has 2 commas in it, separating 3 strings (cat, dog, bird). We split on a comma.
Return Split returns a String array. We then loop over that array's elements with a for-each loop. We display them.
public class Program {
public static void main(String[] args) {
// This string has 3 words separated by commas.
String value = "cat,dog,bird";
// Split on a comma.
String parts[] = value.split(",");
// Display result parts.
for (String part : parts) {
System.out.println(part);
}
}
}cat
dog
bird
Split lines in file. Here we use BufferedReader and FileReader to read in a text file. Then, while looping over it, we split each line. In this way we parse a CSV file with split.
carrot,squash,turnip
potato,spinach,kaleimport java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Program {
public static void main(String[] args) throws IOException {
// Open this file.
BufferedReader reader = new BufferedReader(new FileReader(
"C:\\programs\\file.txt"));
// Read lines from file.
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
// Split line on comma.
String[] parts = line.split(",");
for (String part : parts) {
System.out.println(part);
}
System.out.println();
}
reader.close();
}
}carrot
squash
turnip
potato
spinach
kale
Either character. Often data is inconsistent. Sometimes we need to split on a range or set of characters. With split, this is possible. Here we split on a comma and a colon.
Tip With square brackets, we specify the possible characters to split upon. So we split on all colons and commas, with one call.
public class Program {
public static void main(String[] args) {
String line = "carrot:orange,apple:red";
// Split on comma or colon.
String[] parts = line.split("[,:]");
for (String part : parts) {
System.out.println(part);
}
}
}carrot
orange
apple
red
Count, separate words. We can use more advanced character patterns in split. Here we separate a String based on non-word characters. We use "\W+" to mean this.
Detail The pattern means "one or more non-word characters." A plus means "one or more" and a W means non-word.
Note The comma and its following space are treated as a single delimiter. So two characters are matched as one delimiter.
public class Program {
public static void main(String[] args) {
String line = "hello, how are you?";
// Split on 1+ non-word characters.
String[] words = line.split("\\W+");
// Count words.
System.out.println(words.length);
// Display words.
for (String word : words) {
System.out.println(word);
}
}
}4
hello
how
are
you
Numbers. This example splits a string apart and then uses parseInt to convert those parts into ints. It splits on a two-char sequence. Then in a loop, it calls parseInt on each String.
public class Program {
public static void main(String[] args) {
String line = "1, 2, 3";
// Split on two-char sequence.
String[] numbers = line.split(", ");
// Display numbers.
for (String number : numbers) {
int value = Integer.parseInt(number);
System.out.println(value + " * 20 = " + value * 20);
}
}
}1 * 20 = 20
2 * 20 = 40
3 * 20 = 60
Limit. Split accepts an optional second parameter, a limit Integer. If we provide this, the result array has (at most) that many elements. Any extra parts remain part of the last element.
Info To have a limit argument, we must use a Regex. Here we escape the vertical bar so it is treated like a normal char.
Here We get the first 2 parts split apart correctly, and the third part has all the remaining (unsplit) parts.
public class Program {
public static void main(String[] args) {
String value = "a|b|c|d|e";
// Use limit of just 3 parts.// ... Escape the bar for a Regex.
String parts[] = value.split("\\|", 3);
// Only 3 elements are in the result array.
for (String part : parts) {
System.out.println(part);
}
}
}a
b
c|d|e
Pattern.compile, split. A split method is available on the Pattern class, found in java.util.regex. We can compile a Pattern and reuse it many times. This can enhance performance.
Note A call to Pattern.compile optimizes all split() calls afterwards. But this only helps if many splits are done.
import java.util.regex.Pattern;
public class Program {
public static void main(String[] args) {
// Separate based on number delimiters.
Pattern p = Pattern.compile("\\d+");
String value = "abc100defgh9ij";
String[] elements = p.split(value);
// Display our results.
for (String element : elements) {
System.out.println(element);
}
}
}abc
defgh
ij
Benchmark, pattern split. We can improve the speed of splitting strings based on regular expressions by using Pattern.compile. We create a delimiter pattern. Then we call split() with it.
Version 1 This version of the code uses Pattern split(): it reuses the same Pattern instance many times.
Version 2 This code uses split() with a Regex argument, so it does not reuse the same Regex.
Result When many Strings are split, a call Pattern.compile before using its Split method optimizes performance.
import java.util.regex.Pattern;
public class Program {
public static void main(String[] args) {
// ... Create a delimiter pattern.
Pattern pattern = Pattern.compile("\\W+");
String line = "cat; dog--ABC";
long t1 = System.currentTimeMillis();
// Version 1: use split method on Pattern.
for (int i = 0; i < 1000000; i++) {
String[] values = pattern.split(line);
if (values.length != 3) {
System.out.println(false);
}
}
long t2 = System.currentTimeMillis();
// Version 2: use String split method.
for (int i = 0; i < 1000000; i++) {
String[] values = line.split("\\W+");
if (values.length != 3) {
System.out.println(false);
}
}
long t3 = System.currentTimeMillis();
// ... Benchmark results.
System.out.println(t2 - t1);
System.out.println(t3 - t2);
}
}471 ms, Pattern split
549 ms, String split
Join. This method combines Strings together—we specify our desired delimiter String. Join is sophisticated. It can handle a String array or individual Strings.
Word count. We can count the words in a string by splitting the string on non-word (or space) characters. This is not the fastest method, but it tends to be a fairly accurate one.
With split, we use a regular expression-based pattern. But for simple cases, we provide the delimiter itself as the pattern. This too works. Split is elegant and powerful.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
This page was last updated on Feb 23, 2023 (edit).