Count
wordsA String
contains text divided into words. With a method, we can count the number of words in the String
. This can be implemented in many ways.
With split, we use a regular expression pattern to separate likely words. Then we access the array's length. With a for
-loop, we use the Character class
to detect likely word separators.
Split
implementationLet us begin with the split()
version. We introduce countWords
: this method separates a String
into an array of strings. We split on non-word chars.
if
-statement is used to detect a zero-word string
. This logic works for the case tested, but may not always be enough.public class Program { public static int countWords(String value) { // Split on non-word chars. String[] words = value.split("\\W+"); // Handle an empty string. if (words.length == 1 && words[0].length() == 0) { return 0; } // Return array length. return words.length; } public static void main(String[] args) { String value = "To be or not to be, that is the question."; int count = countWords(value); System.out.println(count); value = "Stately, plump Buck Mulligan came from the stairhead"; count = countWords(value); System.out.println(count); System.out.println(countWords("")); } }10 8 0
Let us rewrite our previous countWords
method. This version uses a simple loop. We use the Character class
to detect certain word boundaries.
countWords
has less computational complexity. It just loops through all characters once.char
is considered whitespace (this includes paces, newlines and tabs).CountWords
here detects a whitespace character, and if a word-start character follows it, the variable "c" is incremented.public class Program { public static int countWords(String value) { int c = 0; for (int i = 1; i < value.length(); i++) { // See if previous char is a space. if (Character.isWhitespace(value.charAt(i - 1))) { // See if this char is a word start character. // ... Some punctuation chars can start a word. if (Character.isLetterOrDigit(value.charAt(i)) == true || value.charAt(i) == '"' || value.charAt(i) == '(') { c++; } } } if (value.length() > 2) { c++; } return c; } public static void main(String[] args) { String value = "To be or not to be, that is the question."; int count = countWords(value); System.out.println(count); value = "Stately, plump Buck Mulligan came from the stairhead"; count = countWords(value); System.out.println(count); System.out.println(countWords("")); } }10 8 0
for
-loopIn the for
-loop method (the second example) we have some issues. We check for certain punctuation characters, but more checks may need to be added.
countWords
method, but not an ideal implementation.In counting words, we require approximations. Some sequences, like numbers, may or may not be considered words. Hyphenated words too are an issue.