Count words. A String contains text divided into words. With a method, we can count the number of words in the String. This can be implemented in many ways.
With split, we use a regular expression pattern to separate likely words. Then we access the array's length. With a for-loop, we use the Character class to detect likely word separators.
Split implementation. Let us begin with the split() version. We introduce countWords: this method separates a String into an array of strings. We split on non-word chars.
Detail The regular expression pattern used, "W+" indicates one or more non-word characters.
Detail An if-statement is used to detect a zero-word string. This logic works for the case tested, but may not always be enough.
public class Program {
public static int countWords(String value) {
// Split on non-word chars.
String[] words = value.split("\\W+");
// Handle an empty string.
if (words.length == 1 && words[0].length() == 0) {
return 0;
}
// Return array length.
return words.length;
}
public static void main(String[] args) {
String value = "To be or not to be, that is the question.";
int count = countWords(value);
System.out.println(count);
value = "Stately, plump Buck Mulligan came from the stairhead";
count = countWords(value);
System.out.println(count);
System.out.println(countWords(""));
}
}10
8
0
Loop version. Let us rewrite our previous countWords method. This version uses a simple loop. We use the Character class to detect certain word boundaries.
Detail This version of countWords has less computational complexity. It just loops through all characters once.
Detail This method detects whether a char is considered whitespace (this includes paces, newlines and tabs).
Detail This is a convenient method. It returns true if we have a letter (either upper or lowercase) or a digit (like 1, 2 or 3).
Note CountWords here detects a whitespace character, and if a word-start character follows it, the variable "c" is incremented.
public class Program {
public static int countWords(String value) {
int c = 0;
for (int i = 1; i < value.length(); i++) {
// See if previous char is a space.
if (Character.isWhitespace(value.charAt(i - 1))) {
// See if this char is a word start character.// ... Some punctuation chars can start a word.
if (Character.isLetterOrDigit(value.charAt(i)) == true
|| value.charAt(i) == '"'
|| value.charAt(i) == '(') {
c++;
}
}
}
if (value.length() > 2) {
c++;
}
return c;
}
public static void main(String[] args) {
String value = "To be or not to be, that is the question.";
int count = countWords(value);
System.out.println(count);
value = "Stately, plump Buck Mulligan came from the stairhead";
count = countWords(value);
System.out.println(count);
System.out.println(countWords(""));
}
}10
8
0
Some issues, for-loop. In the for-loop method (the second example) we have some issues. We check for certain punctuation characters, but more checks may need to be added.
Thus We developed a good approach for a countWords method, but not an ideal implementation.
A review. In counting words, we require approximations. Some sequences, like numbers, may or may not be considered words. Hyphenated words too are an issue.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.