Home
Python
Word Count
This page was last reviewed on Sep 20, 2024.
Dot Net Perls
Word count. How many words are in a string? To answer, we can develop a Python method, wordcount, that uses re.findall to count words.
Python method info. The wordcount() method locates and counts non-whitespace characters with a special pattern. This allows us to avoid complex character testing in a loop.
re.match
Example. The re.findall method is the most important part of this solution. It finds all matches within a string. When we count them, we can count matching patterns.
Note We specify the pattern \S+ in the re.findall method. This means "one or more non-whitespace characters."
String Literal
Note 2 We use the len() built-in to count the number of elements in the resulting list. This equals the number of words in the input string.
len
import re def wordcount(value): # Find all non-whitespace patterns. list = re.findall(r"(\S+)", value) # Return length of resulting list. return len(list) value = "To be or not to be, that is the question." print(wordcount(value)) value = "Welcome, friend, to my website!" print(wordcount(value)) value = "" print(wordcount(value))
10 5 0
Method results. I verified that the method counted correctly the number of words in both (trivial) examples. On more complex samples, such as ones involving markup, results may be less accurate.
So The example method does not count "word endings" but rather the words themselves.
Summary. The regular expression based method for counting words does not exactly mirror all word counting implementations. But in practice the results are close.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Sep 20, 2024 (edit).
Home
Changes
© 2007-2024 Sam Allen.