HomeSearch

Ruby Word Count: Split Method

Use the split method to count words in strings. Split on non-word characters.

Word count. A string contains multiple words. Each word is separated by non-word characters. So each word can be matched with the pattern "\s+." We split words apart and then return the resulting array's length.Regexp

Example. We introduce the wordcount method. It receives one value, a string, and returns the length of an array. The split method splits words apart, treating each sequence of whitespace characters as a delimiter.

Then: I test wordcount with four example strings. I verified that the strings contain the same number of words the method indicates.

Info: The whitespace-only string and the empty string should both contain zero words. The result is as expected.

Ruby program that counts words def wordcount(value) # Split string based on one or more whitespace characters. # ... Then return the length of the array. value.split(/\s+/).length end value = "To be or not to be, that is the question." puts wordcount(value) value = "Stately, plump Buck Mulligan came from the stairhead" puts wordcount(value) puts wordcount " " puts wordcount "" Output 10 8 0 0

Split is one of the more powerful regexp methods. It does not match just one location, like match does. It matches all possible parts of the string and returns an array of the results.

So: It may not be intuitive to use split to count words, but this approach is effective and requires little extra code.

Split

A loop-based method could be faster. It would result in fewer allocations. But this would also introduce further complexity into a program. Often the extra complexity causes more problems.

Summary. When counting words, whitespace chars, along with punctuation, must be treated as non-word characters. We must consider them together, not alone. And with the "\s+" pattern, we can effectively do this.
Home
Dot Net Perls
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.